GPU Programming Resources

Last update: 18 Apr 2024 [History] [Edit]

This page is intended to be a resource hub with recommendations to help you get started with General-Purpose GPU (GPGPU) programming, that is, writing your parallel algorithms to run on GPUs. If you are interested in machine learning use cases, you should look at PyTorch, Tensorflow, or ONNX Runtime to name a few.

Getting Started

GPUs are becoming increasingly more widespread, thanks to the popularity of machine learning algorithms and their power efficiency compared to traditional CPUs (assuming the problem is well adapted for GPUs). The three main vendors, NVIDIA, AMD, and Intel, each provide their native language, CUDA, HIP, and SYCL, respectively to program their GPUs. In an attempt to provide portable code between architecture, multiple options exist Kokkos, OpenACC, Alpaka, and C++ parallel algorithms, each providing different levels of abstraction. You can find a review of the different GPU languages here.

To get started, you should learn CUDA first; it has the most comprehensive documentation and training materials. It is also low-level which is important to understand the GPU programming model. Other GPU languages either share similar concepts or abstract away the programming model. It will be straightforward to transition from CUDA to other GPU languages.

CUDA Resources

Starting out

There are many great introductory tutorials to CUDA. If you need access to NVIDIA GPUs, you can use the CERN SWAN service (make sure to select a GPU software stack) or directly. ssh

The CUDA programming guide is a comprehensive reference manual once you have learned the basics of CUDA.

Deeper dive

Once you understand the GPU programming model, the NVIDIA developer blog has many articles where they will go in-depth on specific advanced topics. In addition, GTC talks will also feature deep dives. Here are a few examples:

Existing libraries

If your use case requires common operations such as sorting, scan, RNG, or matrix operations you’ll most likely find a library that already implements it. You can find a list of libraries provided by NVIDIA here. Libraries at are most likely to be useful:

  • Thrust, a high-level interface for common algorithms,
  • CUB, a lower-level, CUDA-specific version of Thrust
  • cuRAND, random number generation on GPU


There are significant ongoing R&D efforts to make use of GPUs in ATLAS Phase-II computing. You can find an overview in this ATLAS lecture or CHACAL 2024’s lecture, which also includes activities from other experiments (from slide 70).

GPU Development in Athena

Athena supports the three “native” GPU languages, CUDA, HIP, and SYCL. The best way to get started is to look at the CUDA example. This will show you how to link your package to CUDA and write a kernel. Here are a few notes:

  • Thrust and CUB are provided with the CUDA installation, using them only requires a header to include
  • Vecmem, a library for device memory management, can be easily added to your package
  • You can find some CUDA utilities in the AthCUDACore package
  • Try to minimize code in .cu files. If you’re only calling the CUDA Runtime API (e.g. cudaMalloc) your file doesn’t need to be compiled by nvcc and can have the .cxx extension
  • Because of legal reasons, using asetup to provide CUDA will only work from lxplus