Athena is based on the common Gaudi framework that is used by ATLAS, LHCb and FCC.
Athena code is hosted in the CERN
GitLab service
where the repository owned by the atlas
user plays a special
role as the repository from which all production releases
will be built. In the ATLAS development workflow this
repository is also the merge target for developments from
individual users or groups. This repository includes a number of different branches (defined as “an independent line of development”) being actively committed to at any given time, which may be synchronized to varying degrees (either manually or via automated procedures) depending on their purpose. There is much more information about
the development workflow in the ATLAS
git development tutorial.
While the athena repository contains any and all code that could be built
into an ATLAS software release, each release itself generally only
consists of a consistent subset of the code base. Each particular
build flavour is called a project and is steered from a particular
subdirectory of the Projects
directory.
The Athena
project is a complete build of almost
everything in the repository. When a particular Athena project is built the build result encodes the
project name. Thus, independent of any release number, AthSimulation
is
built from different code than AthAnalysisBase
.
The main projects are:
Project | Purpose |
Athena | Reconstruction and Derivation production* |
AthGeneration | For event generation |
AthSimulation | For full Geant4 simulation |
AthAnalysisBase | Athena based analysis |
AnalysisBase | Non-athena ROOT based analysis |
DetCommon | For reading trigger configuration when e.g. configuring L1 Hardware |
Not all projects will exist, or be functional, in any particular
branch of the Athena repository. It should also be noted that although
the Athena
project will be capable of running most workflows it may not represent the latest validated release for all purposes (in particular Event Generation).
* NB While both reconstruction and Derivation use the Athena
project, they may be based on different code branches. Reconstruction will normally be run from a branch which is operating under Frozen Tier 0 policy
i.e. in which changes to job output in the AOD
files are tightly controlled. Derivations on the other hand are expected to change more frequently, since they can be quickly reproduced from AOD
, and so these run from the development branch, i.e. main
(formerly master
). Concretely this means that for example that AOD
files produced in 24.0,Athena
releases will be used to produce DAOD
files in 25.0,Athena
releases.
Athena jobs are built around chains of algorithms. Algorithms process input data and produce output, both of which are stored in the framework’s transient store which is known as StoreGate. They are C++ classes which inherit either from AthAlgorithm or AthReentrantAlgorithm (the distinction is explained below).
All algorithms implement three main methods:
initialize
which runs once at the start of the job, immediately after configurationexecute
which runs once per event (this is mandatory)finalize
which runs once at the end of the jobAlgorithms can be assisted by tools. These are similar to algorithms but have no execute
method, with the developer free to write methods as necessary to perform the required tasks, and these methods can be run as many times as the developer needs per event. Tools can be public or private. The former can be shared across multiple algorithms whereas the latter can only be used by a single algorithm. For reasons related to multi-threaded running (see later), the former are discouraged, and in particular public tools should never be used to share data between algorithms - this must always be done via StoreGate.
Services are are akin to tools but are global in scope, that is, the same instance of each service can be seen by all tools and algorithms in a given job. Common examples of services that you’ll encounter include StoreGate, the messaging service for producing log files, random number generation, and database access.
Although algorithms, tools and services are written in C++, Athena is configured via a configuration layer written in Python. Via this layer the developer instructs the framework what algorithms should be run, what data should be processed, what services should be used, and what should be written to disk. The configuration layer is based around the component accumulator, full details of which are available here and for which a tutorial is also available. This is why Athena jobs are run as Python processes. Complex production tasks such as event generation, simulation, reconstruction and DAOD production need a very large number of components and so they are wrapped up in job transformations which enable such tasks to be run with a single command.
The Athena software is hosted and managed in GitLab, and compilation and installation of the framework is configured with cmake. The software is organised into packages, which contain related algorithms and tools. Packages are only meaningful from an organisational point of view - once compiled it is irrelevant which package a given algorithm or tool is in.
Athena can be run in three modes, depending on the resources available and the task being completed:
In AthenaMT the configuration, initialization and finalization stages of a job are performed sequentially in the “main” thread. However, each Algorithm::execute()
happens in its own thread. Multiple algorithms can concurrently work on the same event in different threads (intra-event parallelism) and multiple events can be executed concurrently (inter-event parallelism). Intel Threaded Building Blocks (TBB) is used in AthenaMT and so thread management is not exposed to the developer. All the developer has to do is ensure their code is not thread-hostile and declare data dependences correctly with read and write handles.
The scheduler is a central framework component that orchestrates the execution of AthenaMT jobs. The main responsibility of the scheduler is to optimally assign algorithms to the available threads from the pool, with the ultimate goal of maximizing the overall event throughput. It determines the order in which the algorithms should be run and which algorithms can be safely run alongside one another in concurrent threads. Algorithms should declare their data dependencies to the scheduler for this purpose. This is also why data must always be passed between algorithms via StoreGate and not via public tools, since data shared via tools is invisible to the scheduler. It has a number of ways to decide which algorithm to execute next:
In AthenaMT algorithms are classified according to whether they must be executed alone, can be cloned into several instances, or can be shared between threads:
AthAlgorithm
. Only one instance exists in the Algorithm Pool. The scheduler will block if such an algorithm needs to run on some event, but is in use in another event.AthAlgorithm
, but the ability to be cloned is explicitly enabled by the developer.AthReentrantAlgorithm
. One instance is shared by all threads, and can be executed concurrently in multiple threads. Software in re-entrant algorithms must be thread-safe.These slides from the development tutorial explain in more detail how AthenaMT operates.
In order for Athena to run smoothly it is important to ensure that your shell environment is completely clean, with no other environment variables set. This applies particularly to LCG and grid software, which may interfere with the environment set by
asetup
below. Check your login scripts to ensure nothing is being set up by default. Failure to set up Athena in a clean shell environment may lead to unexpected crashes and warning/error messages.
To run a simple serial Athena job, log into LXPLUS or another CVMFS-equipped Linux cluster and then:
export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase
alias setupATLAS='source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh'
(These lines can go into your login script)
On LXPLUS or an AlmaLinux9/RHEL9 cluster:
setupATLAS
On any other kind of Linux cluster:
setupATLAS -c el9
or
setupATLAS -c centos7
Then set up the latest release and finally run the job:
asetup Athena,main,latest
python -m AthExHelloWorld.HelloWorldConfig
This will run the job according to this configuration file and will print a number of messages. Studying these messages alongside the software that produces them can be instructive in understanding how a simple Athena job runs.
If you want to make changes to existing code, please follow the instructions in the git development tutorial.
If you need to create a new Athena package either for personal work or for development, you can find many straightforward examples of the structure of Athena packages in AthenaExamples; you can use these as templates. Here is a summary of the basic components of an Athena package:
python
which will contain python configuration scripts;src
. If the package is stand-alone, in other words will not be needed by any other software components in Athena, both the source (.cxx
) and header files (.h
) can be kept in this directory (i.e. we condider the headers private to this package);src
directory you should create another directory called components
. It should contain a single file called PackageName_entries.cxx
. It should contain a list of all of the tools and algorithms in your package, as shown in this examplesrc
, with the same name as the package itself - i.e.the headers should be public within Athena;CMakeLists.txt
file listing the package’s dependencies, and indicating what components should be installed. This must be at the same level as the python
and other subdirectories. You can find comprehensive instructions on constructing these files here and a simple example here;README.md
file at the same level as the CMakeLists.txt
file, explaining what the package does and, if it provides stand-alone run-able components, providing some basic instructions on how to run them. A minimum example can be found here but you are welcome to provide more detailed information as necessary.To assist in the creation of new packages and algorithms, a number of template generators are available under the acmd
tool suite:
acmd.py gen-klass ...
creates template algorithms, tools, interfaces etc.acmd.py cmake new-skeleton ...
and acmd.py cmake new-pkg ...
create the full package structure including subdirectoriesacmd.py cmake new-analysisalg ...
creates a template algorithm tailored to analysis, within an existing package.All of these have a range of options - just add --help
for more information. The acmd
suite has a wide range of tools in addition to these - just use acmd.py --help
for the full list of commands.