An Introduction to the CP Algorithm Text Configuration

The Text Configuration

There is a configuration file in your MyAnalysis/data directory called config.yaml. This file is a text file, which uses YAML, a markup language, to create and return algorithms in an analysis. This file lists all of the specifications needed for an algorithm in an organized way, using tabs, dashes, and new lines, to configure the algorithm as desired by the user.

Let’s start by looking at one of the basic “overhead” commands necessary for any analysis, which is already in the file. Loading and running systematic variations is configured as a part of CommonServices.

# Common (global) services to run.
CommonServices:
    # Turn on/off systematics
    runSystematics: True

Among other functions, this takes care of registering all the systematics to run on, so they can be propagated to subsequent algorithms without configuring them one-by-one. If you do not need to use systematics, you can set the runSystematics option to False. Running systematics increases the processing time and the size of the output ntuple significantly. For this tutorial, we should limit both. Set runSystematics to False. Alternatively, you can pass --no-systematics option to CPRun.py. Note that even if you have a large number of algorithms for various object types with many different systematics, you will only ever need one block for CommonServices. The individual algorithms are smart enough to only run for systematics that actually affect them.

If systematics are included, some fun regex (regular expression) terms can be added to restrict the systematics used, e.g.:
CommonServices:
   runSystematics: False
   FilterSystematics: '^(?:(?!PseudoData).)*$'

For now, leave those systematics disabled.

Throughout the tutorial, additional text blocks will be added to config.yaml in a format similar to the code example above.

The order of text blocks is flexible in the config.yaml file. Blocks can be placed in any order in the file, as long as containers that are referenced in more than one block are properly done so (i.e., with the correct name and spelling in all blocks) to avoid a crash.

Installing the YAML configuration

Do you find it annoying to type ../source/MyAnalysis/data/config.yaml everytime? Great! We can fix that. Typing it once is fine, but every time you run your code or you have multiple configs, it is a bit of a hassle. To make it easier to run your code, in your CMakeLists.txt file, you can add the following line:

atlas_install_data(data/*.yaml)

This will install the script and you can use config.yaml instead of the full path.

Running your code

If you haven’t declare the path to the dataset, you can run the the following lines.

cd ../run
export ALRB_TutorialData=/cvmfs/atlas.cern.ch/repo/tutorials/asg/cern-mar2025
export ALRB_Test_File=${ALRB_TutorialData}/mc20_13TeV.312276.aMcAtNloPy8EG_A14N30NLO_LQd_mu_ld_0p3_beta_0p5_2ndG_M1000.deriv.DAOD_PHYS.e7587_a907_r14861_p6117/DAOD_PHYS.37791038._000001.pool.root.1 #Make sure the path is up to date
echo $ALRB_Test_File > input.txt

You will see an input.txt file created in your run directory, then run

CPRun.py -i input.txt -t config.yaml --no-systematics

Test and commit your changes

Run your code to be sure that it is working properly.

If it is working correctly, you should see additional output that looks like:

Configuring CommonServices
    runSystematics: False
    filterSystematics: None
/*** AlgSequence/AlgSequence ***************************************************
| /*** PythonConfig AsgService/CP::SystematicsSvc/SystematicsSvc *****************
| \--- (End of PythonConfig AsgService/CP::SystematicsSvc/SystematicsSvc) --------
| /*** PythonConfig AsgService/CP::SelectionNameSvc/SelectionNameSvc *************
| \--- (End of PythonConfig AsgService/CP::SelectionNameSvc/SelectionNameSvc) ----
\--- (End of AlgSequence/AlgSequence) ------------------------------------------

At the end you should see a hist-output.root, output.root and a workDir directory in your run directory. The hist-output.root file contains the histograms produced by your algorithm, while the output.root file contains the output ntuples of your algorithm. The workDir directory contains the temporary files created during the run.