研究目的
To implement the CARMEN tomographic reconstructor in GPU-based neural network frameworks (Torch and TensorFlow) to improve performance and determine which framework is faster for real-time implementation in telescopes.
研究成果
TensorFlow is generally faster than Torch for training, especially with larger batch sizes, but both frameworks perform similarly in execution. The networks are small enough that increasing size does not linearly increase times, indicating good GPU parallelization. Execution times are under 2 milliseconds for the largest network, meeting requirements for real-time telescope systems. Future work should explore multi-GPU systems, larger networks, and on-sky learning.
研究不足
The study is limited to specific neural network frameworks (Torch and TensorFlow) and hardware configurations. It does not explore multi-GPU systems or very large networks beyond the tested sizes. The use of random data for DRAGON may not fully represent real-world conditions. Execution time measurements for TensorFlow omit data copy times to GPU.
1:Experimental Design and Method Selection:
The study involves implementing the CARMEN neural network in Torch and TensorFlow frameworks, using GPU acceleration for training and execution. Stochastic Gradient Descent with mini-batches and momentum is used for training.
2:Sample Selection and Data Sources:
Training data is obtained from CANARY simulator for CANARY-B1 and CANARY-C2 systems, and random data is generated for DRAGON. Data sizes are 350,000 samples for CANARY-B1, 1,500,000 for CANARY-C2, and 1,000,000 for DRAGON.
3:List of Experimental Equipment and Materials:
Computer with Ubuntu LTS
4:3, Intel Xeon CPU E5-1650 v3 @ 50GHz, 128GB DDR4 memory, Nvidia GeForce GTX TitanX GPU, SSD hard drive, CUDA 0, cuDNN vExperimental Procedures and Operational Workflow:
For training, weights are initialized and copied to GPU VRAM, training data is loaded into RAM, and training is performed over 20 epochs with mini-batch sizes of 16, 32, 64, 128,
5:For execution, networks are fed with single inputs from h5 files, and times are measured for 10,000 inputs. Data Analysis Methods:
2 Training and execution times are compared between Torch and TensorFlow for different network sizes and batch sizes, with results analyzed to determine performance differences.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容