Abstract:To enhance the parallel efficiency of solving Navier Stokes equations, a graphic processing unit (GPU) parallel algorithm, ported from Runge-Kutta discontinuous Galerkin (RKDG) method, is presented through constructing element-based or edge-based thread hierarchy and corresponding GPU kernels. The data storage and access of the algorithm are designed to be compatible for the various types of memories with different latencies. In comparison with the structured mesh counterpart, in which the structured domain of data dependence is already quite good for the requirement of coalesced memory access, the irregularity of unstructured mesh shows a negative effect on the performance of memory access. To remedy the negative effect, a multi-layered element reordering approach suitable for high-order finite element method is proposed to achieve further acceleration. Starting with the initial mesh, layer structures of elements or edges are constructed with reordering in a layer-by-layer manner to form the data structures suitable for coalesced memory access. An example of mesh reordering is provided with the implementation process detailed. Numerical results of typical flow simulations reveal that the expected order of accuracy of the proposed algorithm is realized, and the calculated results agree well with experiment data or other computed resules in the existing literature, with the maximum GPU speedups achieved up to 67.47. Moreover, the algorithm exhibits the potential to cope with more complex geometries, and the proposed technique can further achieve reordering acceleration.