This page is intended to be a resource hub with recommendations to help you get started with General-Purpose GPU (GPGPU) programming, that is, writing your parallel algorithms to run on GPUs. If you are interested in machine learning use cases, you should look at PyTorch, Tensorflow, or ONNX Runtime to name a few.
GPUs are becoming increasingly more widespread, thanks to the popularity of machine learning algorithms and their power efficiency compared to traditional CPUs (assuming the problem is well adapted for GPUs). The three main vendors, NVIDIA, AMD, and Intel, each provide their native language, CUDA, HIP, and SYCL, respectively to program their GPUs. In an attempt to provide portable code between architecture, multiple options exist Kokkos, OpenACC, Alpaka, and C++ parallel algorithms, each providing different levels of abstraction. You can find a review of the different GPU languages here.
To get started, you should learn CUDA first; it has the most comprehensive documentation and training materials. It is also low-level which is important to understand the GPU programming model. Other GPU languages either share similar concepts or abstract away the programming model. It will be straightforward to transition from CUDA to other GPU languages.
There are many great introductory tutorials to CUDA. If you need access to NVIDIA GPUs, you can use the CERN SWAN service (make sure to select a GPU software stack) or directly. ssh lxplus-gpu.cern.ch
The CUDA programming guide is a comprehensive reference manual once you have learned the basics of CUDA.
Once you understand the GPU programming model, the NVIDIA developer blog has many articles where they will go in-depth on specific advanced topics. In addition, GTC talks will also feature deep dives. Here are a few examples:
If your use case requires common operations such as sorting, scan, RNG, or matrix operations you’ll most likely find a library that already implements it. You can find a list of libraries provided by NVIDIA here. Libraries at are most likely to be useful:
There are significant ongoing R&D efforts to make use of GPUs in ATLAS Phase-II computing. You can find an overview in this ATLAS lecture or CHACAL 2024’s lecture, which also includes activities from other experiments (from slide 70).
Athena supports the three “native” GPU languages, CUDA, HIP, and SYCL. The best way to get started is to look at the CUDA example. This will show you how to link your package to CUDA and write a kernel. Here are a few notes:
.cu
files. If you’re only calling the CUDA Runtime API (e.g. cudaMalloc
) your file doesn’t need to be compiled by nvcc
and can have the .cxx
extensionasetup
to provide CUDA will only work from lxplus