A RKDG GPU parallel algorithm and its acceleration with reordering
CSTR:
Author:
Affiliation:

(1.College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China; 2.Key Laboratory of Unsteady Aerodynamics and Flow Control (Nanjing University of Aeronautics and Astronautics), Ministry of Industry and Information Technology, Nanjing 210016, China)

Clc Number:

V211.3

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    To enhance the parallel efficiency of solving Navier Stokes equations, a graphic processing unit (GPU) parallel algorithm, ported from Runge-Kutta discontinuous Galerkin (RKDG) method, is presented through constructing element-based or edge-based thread hierarchy and corresponding GPU kernels. The data storage and access of the algorithm are designed to be compatible for the various types of memories with different latencies. In comparison with the structured mesh counterpart, in which the structured domain of data dependence is already quite good for the requirement of coalesced memory access, the irregularity of unstructured mesh shows a negative effect on the performance of memory access. To remedy the negative effect, a multi-layered element reordering approach suitable for high-order finite element method is proposed to achieve further acceleration. Starting with the initial mesh, layer structures of elements or edges are constructed with reordering in a layer-by-layer manner to form the data structures suitable for coalesced memory access. An example of mesh reordering is provided with the implementation process detailed. Numerical results of typical flow simulations reveal that the expected order of accuracy of the proposed algorithm is realized, and the calculated results agree well with experiment data or other computed resules in the existing literature, with the maximum GPU speedups achieved up to 67.47. Moreover, the algorithm exhibits the potential to cope with more complex geometries, and the proposed technique can further achieve reordering acceleration.

    Reference
    Related
    Cited by
Get Citation
Related Videos

Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 11,2022
  • Revised:
  • Adopted:
  • Online: August 06,2023
  • Published:
Article QR Code