研究目的
To develop a GPU-accelerated FDTD solver for modelling Ground Penetrating Radar (GPR) using NVIDIA’s CUDA framework, aiming to significantly reduce computational time and resources required for simulations.
研究成果
The GPU-accelerated FDTD solver developed in this study significantly outperforms the parallelised (OpenMP) CPU solver, achieving performance throughputs up to 30 times faster on NVIDIA GPUs. The solver's performance is largely dependent on the memory bandwidth of the GPU, with the Tesla P100 GPU demonstrating the best performance due to its high-bandwidth memory. The cost-performance benefit of the GeForce-series GPUs makes this work accessible to many individuals using commodity workstations. The solver is expected to advance GPR research in areas such as full-waveform inversion and machine learning, where many forward simulations are required.
研究不足
The study acknowledges that the FDTD method can suffer from errors due to 'stair-case' approximations of complex geometrical details and requires extensive computational resources for discretising the entire computational domain. Additionally, the performance of the GPU kernels is largely dependent on the memory bandwidth of the GPU, which may limit the scalability of the solver for very large models.
1:Experimental Design and Method Selection:
The study employed the Finite-Difference Time-Domain (FDTD) method for numerical modelling of electromagnetic wave propagation, specifically for GPR simulations. The FDTD method was chosen for its explicit, versatile, robust, and relatively simple implementation.
2:Sample Selection and Data Sources:
The models included free-space domains, heterogeneous soils with rough surfaces, buried pipes and utilities, and anti-personnel landmines. The spatial resolution was set to ?x=?y=?z=1mm or 2mm, and the temporal resolution was ?t=
3:926ps or 852ps. List of Experimental Equipment and Materials:
The study utilized NVIDIA GPUs (GeForce GTX 1080 Ti, TITAN X, Tesla K40c, Tesla K80, Tesla P100) for GPU-accelerated simulations and Intel CPUs (Core i7-4790K, Xeon E5-2640 v4) for CPU-based simulations.
4:Experimental Procedures and Operational Workflow:
The GPU kernels were designed for optimal execution on NVIDIA GPUs using CUDA. The performance of the GPU-accelerated solver was benchmarked against the parallelised (OpenMP) CPU solver using simple and realistic GPR models.
5:Data Analysis Methods:
The performance throughput was measured in millions of cells per second (Mcells/s) using the formula P = (NX·NY·NZ·NT) / (T·1×10^6), where NX, NY, NZ are the number of cells in the domain, NT is the number of time-steps, and T is the runtime in seconds.
独家科研数据包,助您复现前沿成果,加速创新突破
获取完整内容