TY - JOUR
T1 - An efficient GPU algorithm for tetrahedron-based Brillouin-zone integration
AU - Guterding, Daniel
AU - Jeschke, Harald O.
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
Copyright:
Copyright 2018 Elsevier B.V., All rights reserved.
PY - 2018/10
Y1 - 2018/10
N2 - We report an efficient algorithm for calculating momentum-space integrals in solid state systems on modern graphics processing units (GPUs). Our algorithm is based on the tetrahedron method, which we demonstrate to be ideally suited for execution in a GPU framework. In order to achieve maximum performance, all floating point operations are executed in single precision. For benchmarking our implementation within the CUDA programming framework we calculate the orbital-resolved density of states in an iron-based superconductor. However, our algorithm is general enough for the achieved improvements to carry over to the calculation of other momentum integrals such as, e.g. susceptibilities. If our program code is integrated into an existing program for the central processing unit (CPU), i.e. when data transfer overheads exist, speedups of up to a factor ∼130 compared to a pure CPU implementation can be achieved, largely depending on the problem size. In case our program code is integrated into an existing GPU program, speedups over a CPU implementation of up to a factor ∼165 are possible, even for moderately sized workloads.
AB - We report an efficient algorithm for calculating momentum-space integrals in solid state systems on modern graphics processing units (GPUs). Our algorithm is based on the tetrahedron method, which we demonstrate to be ideally suited for execution in a GPU framework. In order to achieve maximum performance, all floating point operations are executed in single precision. For benchmarking our implementation within the CUDA programming framework we calculate the orbital-resolved density of states in an iron-based superconductor. However, our algorithm is general enough for the achieved improvements to carry over to the calculation of other momentum integrals such as, e.g. susceptibilities. If our program code is integrated into an existing program for the central processing unit (CPU), i.e. when data transfer overheads exist, speedups of up to a factor ∼130 compared to a pure CPU implementation can be achieved, largely depending on the problem size. In case our program code is integrated into an existing GPU program, speedups over a CPU implementation of up to a factor ∼165 are possible, even for moderately sized workloads.
KW - CUDA Brillouin zone integration
KW - Density-functional theory (DFT)
KW - GPU computing
KW - Tetrahedron
UR - http://www.scopus.com/inward/record.url?scp=85047237089&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85047237089&partnerID=8YFLogxK
U2 - 10.1016/j.cpc.2018.04.022
DO - 10.1016/j.cpc.2018.04.022
M3 - Article
AN - SCOPUS:85047237089
SN - 0010-4655
VL - 231
SP - 114
EP - 121
JO - Computer Physics Communications
JF - Computer Physics Communications
ER -