Speaker
Jeff Hammond
(NVIDIA)
Description
This talk will describe our experience porting the bottleneck kernels of NWChem's CCSD(T) code. With the NVIDIA HPC Fortran compiler, we are able to run completely standard Fortran code on A100 GPUs, with respectable performance, relative to the expert-optimized GPU implementation based on CUTENSOR. The same code runs well on a range of CPUs. We will explain how to optimize DO CONCURRENT for GPU and CPU architectures using different variants of standard parallelism. We will also show how the NVIDIA HPC Fortran compiler supports high-performance GPU implementations of linear algebra intrinsics, using a CUTENSOR back-end.
Primary author
Jeff Hammond
(NVIDIA)