Athena Introduction

Last update: 23 Dec 2023 [History] [Edit]

Athena is based on the common Gaudi framework that is used by ATLAS, LHCb and FCC.

Athena code is hosted in the CERN GitLab service where the repository owned by the atlas user plays a special role as the repository from which all production releases will be built. In the ATLAS development workflow this repository is also the merge target for developments from individual users or groups. This repository includes a number of different branches (defined as “an independent line of development”) being actively committed to at any given time, which may be synchronized to varying degrees (either manually or via automated procedures) depending on their purpose. There is much more information about the development workflow in the ATLAS git development tutorial.

Projects

While the athena repository contains any and all code that could be built into an ATLAS software release, each release itself generally only consists of a consistent subset of the code base. Each particular build flavour is called a project and is steered from a particular subdirectory of the Projects directory.

The Athena project is a complete build of almost everything in the repository. When a particular Athena project is built the build result encodes the project name. Thus, independent of any release number, AthSimulation is built from different code than AthAnalysisBase.

The main projects are:

Project Purpose
Athena Reconstruction and Derivation production*
AthGeneration For event generation
AthSimulation For full Geant4 simulation
AthAnalysisBase Athena based analysis
AnalysisBase Non-athena ROOT based analysis
DetCommon For reading trigger configuration when e.g. configuring L1 Hardware

Not all projects will exist, or be functional, in any particular branch of the Athena repository. It should also be noted that although the Athena project will be capable of running most workflows it may not represent the latest validated release for all purposes (in particular Event Generation).

* NB While both reconstruction and Derivation use the Athena project, they may be based on different code branches. Reconstruction will normally be run from a branch which is operating under Frozen Tier 0 policy i.e. in which changes to job output in the AOD files are tightly controlled. Derivations on the other hand are expected to change more frequently, since they can be quickly reproduced from AOD, and so these run from the development branch, i.e. main (formerly master). Concretely this means that for example that AOD files produced in 24.0,Athena releases will be used to produce DAOD files in 25.0,Athena releases.

Athena basics

Athena jobs are built around chains of algorithms. Algorithms process input data and produce output, both of which are stored in the framework’s transient store which is known as StoreGate. They are C++ classes which inherit either from AthAlgorithm or AthReentrantAlgorithm (the distinction is explained below).

All algorithms implement three main methods:

  • initialize which runs once at the start of the job, immediately after configuration
  • execute which runs once per event (this is mandatory)
  • finalize which runs once at the end of the job

Algorithms can be assisted by tools. These are similar to algorithms but have no execute method, with the developer free to write methods as necessary to perform the required tasks, and these methods can be run as many times as the developer needs per event. Tools can be public or private. The former can be shared across multiple algorithms whereas the latter can only be used by a single algorithm. For reasons related to multi-threaded running (see later), the former are discouraged, and in particular public tools should never be used to share data between algorithms - this must always be done via StoreGate.

Services are are akin to tools but are global in scope, that is, the same instance of each service can be seen by all tools and algorithms in a given job. Common examples of services that you’ll encounter include StoreGate, the messaging service for producing log files, random number generation, and database access.

Although algorithms, tools and services are written in C++, Athena is configured via a configuration layer written in Python. Via this layer the developer instructs the framework what algorithms should be run, what data should be processed, what services should be used, and what should be written to disk. The configuration layer is based around the component accumulator, full details of which are available here and for which a tutorial is also available. This is why Athena jobs are run as Python processes. Complex production tasks such as event generation, simulation, reconstruction and DAOD production need a very large number of components and so they are wrapped up in job transformations which enable such tasks to be run with a single command.

The Athena software is hosted and managed in GitLab, and compilation and installation of the framework is configured with cmake. The software is organised into packages, which contain related algorithms and tools. Packages are only meaningful from an organisational point of view - once compiled it is irrelevant which package a given algorithm or tool is in.

Running modes

Athena can be run in three modes, depending on the resources available and the task being completed:

  • Serial: the job runs as a single process on a single core without concurrency. This is the simplest mode of operation and is typically used for local software development and testing
  • Multi-process (AthenaMP): concurrency is achieved by forking the process into multiple processes after the initialise method or first event has run. Each process handles a batch of events independently, but the processes are able to share read-only memory pages allocated and initialized by the main process, thereby saving memory. AthenaMP was the standard mode of production during run 2 and is now used primarily for DAOD and pile-up production, although both applications will eventually migrate to MT as well.
  • Multithreaded (AthenaMT): full multithreaded operation with inter- and intra-event concurrency, with the best memory performance as a result. Simulation and reconstruction are run in this way.

More about multi-threaded Athena

In AthenaMT the configuration, initialization and finalization stages of a job are performed sequentially in the “main” thread. However, each Algorithm::execute() happens in its own thread. Multiple algorithms can concurrently work on the same event in different threads (intra-event parallelism) and multiple events can be executed concurrently (inter-event parallelism). Intel Threaded Building Blocks (TBB) is used in AthenaMT and so thread management is not exposed to the developer. All the developer has to do is ensure their code is not thread-hostile and declare data dependences correctly with read and write handles.

The scheduler is a central framework component that orchestrates the execution of AthenaMT jobs. The main responsibility of the scheduler is to optimally assign algorithms to the available threads from the pool, with the ultimate goal of maximizing the overall event throughput. It determines the order in which the algorithms should be run and which algorithms can be safely run alongside one another in concurrent threads. Algorithms should declare their data dependencies to the scheduler for this purpose. This is also why data must always be passed between algorithms via StoreGate and not via public tools, since data shared via tools is invisible to the scheduler. It has a number of ways to decide which algorithm to execute next:

  • Dataflow based decision: for a given event execute a consumer algorithm only after all producers of its input data have been executed
  • Control flow based decision: algorithms are executed within predefined sequences, either sequentially or concurrently

In AthenaMT algorithms are classified according to whether they must be executed alone, can be cloned into several instances, or can be shared between threads:

  • Non-clonable (default): these derive from AthAlgorithm. Only one instance exists in the Algorithm Pool. The scheduler will block if such an algorithm needs to run on some event, but is in use in another event.
  • Cloneable: several instances are available for the scheduler in the Algorithm Pool. They also derive from AthAlgorithm, but the ability to be cloned is explicitly enabled by the developer.
  • Re-entrant: these derive from AthReentrantAlgorithm. One instance is shared by all threads, and can be executed concurrently in multiple threads. Software in re-entrant algorithms must be thread-safe.

These slides from the development tutorial explain in more detail how AthenaMT operates.

Getting started

To run a simple serial Athena job, log into LXPLUS or another CVMFS-equipped Linux cluster and then:

export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'

(These lines can go into your login script)

On LXPLUS or an AlmaLinux9/RHEL9 cluster:

setupATLAS

On any other kind of Linux cluster:

setupATLAS -c el9

or

setupATLAS -c centos7

Then set up the latest release and finally run the job:

asetup Athena,main,latest
python -m AthExHelloWorld.HelloWorldConfig

This will run the job according to this configuration file and will print a number of messages. Studying these messages alongside the software that produces them can be instructive in understanding how a simple Athena job runs.

  • A good place to get more detailed information on a wide range of ATLAS software development topics is the software development tutorial. Go to the general entry point for ATLAS computing tutorials and follow the link to the development tutorial (internal to ATLAS)
  • The C++ coding guidelines should be followed if you are developing C++ for use in Athena
  • If you are interested in using Athena for physics analysis, see this simple guide and this comprehensive handbook (internal to ATLAS)
  • Links to the main applications that are run using Athena can be found from the main computing Twiki (internal to ATLAS)