The CP Algorithms Tutorial introduces only the most commonly-used algorithms in an analysis. These algorithms perform only basic and commonly-used ntupling functions with the xAOD data.
If we want the analysis to perform a function not readily available in the CP algorithms, then we must create a custom algorithm and add it to the config file.
The steering macro (i.e. ATestRun_eljob.py
) that was used in the conventional tutorial will not need to be edited to run the custom algorithm. Instead, only the text configuration file, along with many under-the-hood files pertaining to the block configuration and the new algorithm itself, will be edited to perform our desired functions.
Our example algorithm contains the number of electrons, jets, and muons per event, along with the dilepton mass and dijet mass (known as “mll” and “mjj,” respectively).
We will start by creating the algorithm, then by building the python block that “wraps” it, then by adding that block to the configuration file, then finishing finally by including that algorithm in the text configuration.
We want to create branches in our output that contains the number of electrons, the number of muons, the number of jets, the dilepton mass, and dijet mass. To do so, we must specify a custom algorithm, the implementation of which is spread out across multiple files.
Firstly, for the custom algorithm to be compiled, we must include a line in the CMakeLists.txt
file at the top level of the MyAnalysis
directory. Add the lines xAODMuon
, xAODEgamma
, xAODJet
, SystematicsHandlesLib
, and SelectionHelpersLib
to
the LINK_LIBRARIES
section of the file so that the electron, muon, and jet libraries, along with the systematics handles and selection helpers libraries, are linked to these objects during compilation.
The CP Algorithms are object-oriented, so algorithms are treated as classes. Classes are organized so that their members and attributes are declared in a header file, saved as .h
, and these attributes are defined in a source file, saved as .cxx
. (There are other files necessary for classes to be compiled, which we will describe here, too.) The header files of an algorithm are stored in a directory that shares the name of its parent directory - in this case, for MyAnalysis
, you will find MyxAODAnalysis.h
in the MyAnalysis
subdirectory. Also in this subdirectory is a file, MyAnalysisDict.h
, that specifies all header files for which a dictionary must be made. (The source files are in the root
subdirectory, so you will find MyxAODAnalysis.cxx
there.)
We start by defining the algorithm in MyxAODAnalysis.h
. First comes the pragma that directs the compiler to create the object. At the top of the file, add the following lines:
#ifndef MyAnalysis_MyxAODAnalysis_H
#define MyAnalysis_MyxAODAnalysis_H
#endif
The remaining contents of the header file will go between the #define
and #endif
lines of the file.
We need to include all relevant objects that are used in this file, so we add the following directives to the top of the file (after the #ifndef/#define
lines):
// Algorithms and systematics
#include <AnaAlgorithm/AnaAlgorithm.h>
#include <SystematicsHandles/SysReadHandle.h>
#include <SystematicsHandles/SysWriteHandle.h>
#include <SelectionHelpers/SysReadSelectionHandle.h>
#include <SystematicsHandles/SysReadDecorHandle.h>
#include <SystematicsHandles/SysWriteDecorHandle.h>
// Containers
#include <xAODEgamma/ElectronContainer.h>
#include <xAODMuon/MuonContainer.h>
#include <xAODJet/JetContainer.h>
#include <xAODMissingET/MissingETContainer.h>
#include <xAODEventInfo/EventInfo.h>
// Others
#include <AthContainers/ConstDataVector.h>
With our necessary inclusions, we must now declare the class. Do so by copying this declaration into the file:
class MyxAODAnalysis : public EL::AnaAlgorithm
{
};
Everything that populates the class will go in between these curly braces. Note that the : public EL::AnaAlgorithm
part of what you copied means that the MyxAODAnalysis
class is built on top of the EL::AnaAlgorithm
base class.
Now, we want to specify the public members of the algorithm. Every class needs a public constructor, which sets up the class upon its instantiation (similar to __init()__
in a python class) with its proper inputs, and a destructor, which uninstantiates the class and cleans up memory. Include the following lines in your code to create these attributes:
public:
// Algorithm constructor
MyxAODAnalysis (const std::string& name, ISvcLocator *pSvcLocator);
// Destructor
virtual ~MyxAODAnalysis ();
The other public attributes are three functions that are conventional for analysis algorithms: initialize()
, execute()
, and finalize()
. The initialize()
method runs when the analysis starts, execute()
runs as EventLoop loops through the events of the AOD file, and finalize()
runs as the analysis closes. Add these methods by copying the following lines in the header:
// These are the functions inherited from Algorithm
virtual StatusCode initialize () override;
virtual StatusCode execute () override;
virtual StatusCode finalize () override;
The rest of the class will contain objects we do not want the user to deal with directly, so these attributes will be private. These attributes include the systematics, the handles and selections for the electrons, muons, and jets, the EventInfo handle, and the decorators for the numbers of electrons, muons, and jets, the dilepton masses, and dijet masses. These attributes are assigned and specified through the algorithm block in python, to be completed in the next parts of the tutorial.
In the context of this code, a handle is a “proxy object,” used to define tools and containers without specifying exactly which tool or container to read in or out (which is left for when a full configuration of the analysis sequence is in place). For this analysis, four types of handles will be used: a SysListHandle
, a SysReadHandle
, a SysReadSelectionHandle
, and a SysWriteDecorHandle
.
The SysListHandle
houses the list of systematics errors in the analysis. The SysReadHandle
retrieves systematically-varied object collections, while the SysReadSelectionHandle
reads the input systematically-varied selection cut. Finally, the SysWriteDecorHandle
registers the new systematically-varied decoration applied in this algorithm. The workflow is then intuitive: retrieve the systematics, the objects, and their selections and variations per systematics, apply the algorithm and decorate the objects that are selected from the algorithm, then write those decorations out for future use.
Include the following lines in the header file beneath the public members:
private:
// Systematics
CP::SysListHandle m_systematicsList{this};
// Input containers needed for variable definitions
// Electrons
CP::SysReadHandle<xAOD::ElectronContainer> m_electronsHandle{
this, "electrons", "", "the electron container to use"};
CP::SysReadSelectionHandle m_electronSelection{
this, "electronSelection", "", "the selection on the input electrons"};
// Muons
CP::SysReadHandle<xAOD::MuonContainer> m_muonsHandle{
this, "muons", "", "the muon container to use"};
CP::SysReadSelectionHandle m_muonSelection{
this, "muonSelection", "", "the selection on the input muons"};
// Jets
CP::SysReadHandle<xAOD::JetContainer> m_jetsHandle{
this, "jets", "", "the jet container to use"};
CP::SysReadSelectionHandle m_jetSelection{
this, "jetSelection", "", "the selection on the input jets"};
// MET
//CP::SysReadHandle<xAOD::MissingETContainer> m_metHandle{
// this, "met", "", "the MET container to use"};
// EventInfo
CP::SysReadHandle<xAOD::EventInfo> m_eventInfoHandle{
this, "eventInfo", "EventInfo", "the EventInfo container to read selection decisions from"};
// Variables to be added to output
CP::SysWriteDecorHandle<int> m_muNDecor {
this, "mu_n", "mu_n_%SYS%", "Number of muons"
};
CP::SysWriteDecorHandle<int> m_elNDecor {
this, "el_n", "el_n_%SYS%", "Number of electrons"
};
CP::SysWriteDecorHandle<int> m_jetNDecor {
this, "jet_n", "jet_n_%SYS%", "Number of jets"
};
CP::SysWriteDecorHandle<float> m_mllDecor {
this, "mll", "mll_%SYS%", "Dilepton mass of the two leading leptons"
};
CP::SysWriteDecorHandle<float> m_mjjDecor {
this, "mjj", "mjj_%SYS%", "Dijet mass of the two leading jets"
};
SysReadHandles
and SysReadSelectionHandles
are created for the electrons, muons, and jets, and for their already-applied selections. A SysReadHandle
is created for the eventInfo
, which stores all selection information for each object, along with run numbers and event numbers. SysWriteDecorHandles
are created for each of the quantities calculated by the algorithm. Note that we are keeping commented code for if we ever want to use missing transverse energy in the algorithm. It is kept private, too.
Note: While we are still in the MyAnalysis
subdirectory, we must add the dictionary of this class to the MyAnalysisDict.h
file by including this line in it:
// This file includes all the header files that you need to create
// dictionaries for.
#include <MyAnalysis/MyxAODAnalysis.h>
In addition, there is a selection.xml
file in this directory that must include this library for the algorithm to compile. Include the following line in that file:
<!-- This file contains a list of all classes for which a dictionary
should be created. -->
<class name="MyxAODAnalysis" />
Now, let’s define all attributes declared in the class header. Firstly, we must include the header file for compilation. We specify this in the following line at the top of the MyxAODAnalysis.cxx
in the Root
directory:
#include <MyAnalysis/MyxAODAnalysis.h>
In Root/MyxAODAnalysis.cxx
, we must now explicitly define each method, starting with the constructor. We have no specific actions in the constructor to set, so we can take a lesson from the following comment left in the constructor:
MyxAODAnalysis :: MyxAODAnalysis (const std::string& name,
ISvcLocator *pSvcLocator)
: EL::AnaAlgorithm (name, pSvcLocator)
{
// Here you put any code for the base initialization of variables,
// e.g. initialize all pointers to 0. This is also where you
// declare all properties for your algorithm. Note that things like
// resetting statistics variables or booking histograms should
// rather go into the initialize() function.
}
Similarly, we have nothing to define in the destructor, thus we can define it as:
MyxAODAnalysis::~MyxAODAnalysis(){
// Here you can delete any pointers or clear any memory
// no longer needed after the algorithm has run.
}
Now, we will load/allocate memory for all relevant handles and objects (the private attributes of the class) to be used during the event looping (i.e. actual selection to be carried out in execute()
) in the initialize()
method. Do so by creating the initialize()
method:
StatusCode MyxAODAnalysis::initialize(){
// Here you do everything that needs to be done at the very
// beginning on each worker node, e.g. create histograms and output
// trees. This method gets called before any input files are
// connected.
ANA_MSG_INFO ("Initializing MyxAODAnalysis " << name());
// input handles
// Electrons
ANA_CHECK(m_electronsHandle.initialize(m_systematicsList));
ANA_CHECK(m_electronSelection.initialize(m_systematicsList, m_electronsHandle, SG::AllowEmpty));
// Muons
ANA_CHECK(m_muonsHandle.initialize(m_systematicsList));
ANA_CHECK(m_muonSelection.initialize(m_systematicsList, m_muonsHandle, SG::AllowEmpty));
// Jets
ANA_CHECK(m_jetsHandle.initialize(m_systematicsList));
ANA_CHECK(m_jetSelection.initialize(m_systematicsList, m_jetsHandle, SG::AllowEmpty));
// MissingET
//ANA_CHECK(m_metHandle.initialize(m_systematicsList));
// EventInfo
ANA_CHECK(m_eventInfoHandle.initialize(m_systematicsList));
// output handles
ANA_CHECK(m_mllDecor.initialize(m_systematicsList, m_eventInfoHandle));
ANA_CHECK(m_mjjDecor.initialize(m_systematicsList, m_eventInfoHandle));
ANA_CHECK(m_jetNDecor.initialize(m_systematicsList, m_eventInfoHandle));
ANA_CHECK(m_elNDecor.initialize(m_systematicsList, m_eventInfoHandle));
ANA_CHECK(m_muNDecor.initialize(m_systematicsList, m_eventInfoHandle));
// Systematic list
ANA_CHECK(m_systematicsList.initialize());
return StatusCode::SUCCESS;
}
The method uses
ANA_MSG_INFO()
for print statements andANA_CHECK()
when loading objects. Both methods are preferred when running an analysis algorithm because: 1. TheANA_MSG_XXXXX()
family is preferred to toggle print statements between info, debug, warning, and failure statuses, and
- Running
ANA_CHECK()
exits the code on failure upon a failed function execution.
The
initialize()
,execute()
, andfinalize()
methods all return aStatusCode
, the preferred flag over a standard boolean for the CP Algorithms (SUCCESS “maps” to a 0, as inreturn 0;
, FAILURE maps to a 1, as inexit 1;
).
In this method, each of the object handles are initialized to their respective object containers from the DAOD. If extra systematics are generated by these containers, then they are added to the systematics list.
The object selection handles are similarly initialized to their selections, linking to the systematics list and object containers in the process. The eventInfo
handle is initialized, too, as are the output handles, adding their decorations to the eventInfo
in the process.
Once everything is initialized, the systematics list is then set, so it is finalized by its own initialization method.
We will now populate the execute()
method to select and store the electron count, muon count, jet count, dilepton mass, and dijet mass. To do so, we retrieve the event info of the output storage, then loop over all systematics that may have been specified in the analysis, then load the event info for each event, and retrieve the electrons, muons, and jets.
We then declare vectors for the selected electrons, muons, and jets. When looping over all muons, we store up to two selected muons, all while increasing the count of the selected muons vector. We do the same for the electrons, while we simply store the count of the jets. We then count the number of objects in each of the three containers and calculate the mass of the two leptons and two jets (if there are two leptons and two jets). Finally, we set the output variables with the object counts and calculated masses before returning gracefully.
Copy the following lines of code to create the execute()
method to accomplish these tasks:
StatusCode MyxAODAnalysis::execute(){
// Here you do everything that needs to be done on every single
// events, e.g. read input variables, apply cuts, and fill
// histograms and trees. This is where most of your actual analysis
// code will go.
ANA_MSG_INFO ("Executing MyxAODAnalysis " << name());
// Retrieve the eventInfo object from the event store
const xAOD::EventInfo *eventInfo = nullptr;
ANA_CHECK (m_eventInfoHandle.retrieve (eventInfo, {}));
// Print out run and event number from retrieved object
ANA_MSG_INFO ("in execute, runNumber = " << eventInfo->runNumber() << ", eventNumber = " << eventInfo->eventNumber());
for (const auto &sys : m_systematicsList.systematicsVector()){
// retrieve the EventInfo
const xAOD::EventInfo *evtInfo = nullptr;
ANA_CHECK(m_eventInfoHandle.retrieve(evtInfo, sys));
// retrieve objects
const xAOD::ElectronContainer *electrons = nullptr;
ANA_CHECK(m_electronsHandle.retrieve(electrons, sys));
const xAOD::MuonContainer *muons = nullptr;
ANA_CHECK(m_muonsHandle.retrieve(muons, sys));
const xAOD::JetContainer *jets = nullptr;
ANA_CHECK(m_jetsHandle.retrieve(jets, sys));
//const xAOD::MissingETContainer *met = nullptr;
//ANA_CHECK(m_metHandle.retrieve(met, sys));
// apply object-wise selection
/* NOTE HERE: The vector that will house the leptons for mll calculation is the
recommended vector to use when not many operations will be made on the particle
quantities. The ConstDataVector used for the selected jets is recommended for
many operations to be carried out on the particles in the vector. Both types of
vectors are instantiated here for demonstrative purposes. */
std::vector<const xAOD::IParticle*> leptons;
ConstDataVector<xAOD::JetContainer> selectedJets(SG::VIEW_ELEMENTS);
// Muons
int nMu = 0;
for (const xAOD::Muon *mu : *muons){
// This if statement is just to loop over selected muons that will appear in the output ntuple
if (m_muonSelection.getBool(*mu, sys)){
leptons.push_back(mu);
nMu++;
}
}
// Electrons
int nEl = 0;
for (const xAOD::Electron *el : *electrons){
// This if statement is just to loop over selected electrons that will appear in the output ntuple
if (m_electronSelection.getBool(*el, sys)){
leptons.push_back(el);
nEl++;
}
}
// Jets
int nJet = 0;
for (const xAOD::Jet *jet : *jets){
// This if statement is just to loop over selected jets that will appear in the output ntuple
if (m_jetSelection.getBool(*jet, sys)){
selectedJets.push_back(jet);
nJet++;
}
}
// Calculate the mll of the leading two leptons (sort by highest pT, then calculate the mass)
float mll = 0.0;
if (nMu + nEl > 1){
std::sort(leptons.begin(), leptons.end(), [] (const xAOD::IParticle* lep1, const xAOD::IParticle* lep2) {return lep1->pt() > lep2->pt();});
mll = (leptons.at(0)->p4() + leptons.at(1)->p4()).M();
}
// get mjj
float mjj = 0.0;
if (nJet > 1){
// will crash if jet1 or jet2 were not set
mjj = (selectedJets.at(0)->p4() + selectedJets.at(1)->p4()).M();
}
// set variables
m_muNDecor.set(*evtInfo, nMu, sys);
m_elNDecor.set(*evtInfo, nEl, sys);
m_jetNDecor.set(*evtInfo, nJet, sys);
m_mllDecor.set(*evtInfo, mll, sys);
m_mjjDecor.set(*evtInfo, mjj, sys);
}
return StatusCode::SUCCESS;
}
Note that, again, we have included commented code for if we ever want to do something with the missing transverse energy. In addition, the algorithm will print out the run number and event number each time it is called on a new event for debugging purposes.
Now, we write the finalize()
method for this algorithm with a simple lesson in a comment, along with its proper return:
StatusCode MyxAODAnalysis::finalize(){
// This method is the mirror image of initialize(), meaning it gets
// called after the last event has been processed on the worker node
// and allows you to finish up any objects you created in
// initialize() before they are written to disk. This is actually
// fairly rare, since this happens separately for each worker node.
// Most of the time you want to do your post-processing on the
// submission node after all your histogram outputs have been
// merged.
return StatusCode::SUCCESS;
}
src/components
fileThere is one more task to be completed to define the algorithm fully. We need the C++ algorithm to be recognized in Python when calling it. We do so by adding it as a “component” to the software build. In the src/components
subdirectories, there is a file called MyAnalysis_entries.cxx
. Here, we must add the following lines for the proper include and to declare the algorithm for compilation:
#include <MyAnalysis/MyxAODAnalysis.h>
DECLARE_COMPONENT (MyxAODAnalysis)
With this file populated, the algorithm has been fully defined! We can now include it in our block and text configurations.
In the file MyxAODAnalysisConfig.py
, we will define a class to call the algorithm. This class is the configuration block, and it will be initialized with containers for the electrons, muons, and jets along with the algorithm that we just wrote. First, we must include imports standard for any CP algorithm at the top of the file:
# AnaAlgorithm import(s):
from AnalysisAlgorithmsConfig.ConfigBlock import ConfigBlock
from AnalysisAlgorithmsConfig.ConfigAccumulator import DataType
from AthenaConfiguration.Enums import LHCPeriod
Once these lines are included, add the following lines of code to this python file:
class MyxAODAnalysisConfig(ConfigBlock):
def __init__(self):
super(MyxAODAnalysisConfig, self).__init__('MyxAODAnalysisConfig')
self.addOption('electrons', '', type=str)
self.addOption('muons', '', type=str)
self.addOption('jets', '', type=str)
#self.addOption('met', '', type=str)
def makeAlgs(self, config):
alg = config.createAlgorithm('MyxAODAnalysis', 'MyxAODAnalysis')
alg.electrons, alg.electronSelection = config.readNameAndSelection(self.electrons)
alg.muons, alg.muonSelection = config.readNameAndSelection(self.muons)
alg.jets, alg.jetSelection = config.readNameAndSelection(self.jets)
#alg.met = config.readName(self.met)
config.addOutputVar('EventInfo', 'mll_%SYS%', 'mll')
config.addOutputVar('EventInfo', 'mjj_%SYS%', 'mjj')
config.addOutputVar('EventInfo', 'jet_n_%SYS%', 'jet_n')
config.addOutputVar('EventInfo', 'el_n_%SYS%', 'el_n')
config.addOutputVar('EventInfo', 'mu_n_%SYS%', 'mu_n')
The __init__()
function runs when the class is instantiated automatically, and it defines the member variables electrons
, muons
, and jets
, which are the electron container, muon container, and jet container, with their respective selection names, specified by the user. The makeAlgs()
function calls the algorithm with the three containers and selection info as its data. The final lines augment the output with new variables, the quantities we wish to enumerate and calculate.
Since the algorithm has been defined to calculate the desired quantities for each event already, and since the text configuration has been scheduled to run in MyAnalysisAlgorithms.py
, to include our new branches in the final ntuple, we must include two new blocks of code into our config text. The first goes at the top of your yaml
file:
# Added custom algorithms. Expects a list
AddConfigBlocks:
# Calls `import functionName from modulePath`
modulePath: 'MyAnalysis.MyxAODAnalysisConfig'
functionName: 'MyxAODAnalysisConfig'
# Name that will be used in YAML file
algName: 'MyxAODAnalysis'
# Will run before 'Output' algorithm
pos: 'Output'
This block of text calls a config block specifically designed to add additional algorithms not already part of the CP Algorithms code to a job. The modulePath
and functionName
point to the location of the algorithm in its python file and of the file in its directory. The algName
gives a unique name to the algorithm so that it can be tracked in the output logging when the job runs. The pos
specifies where in the list of algorithms the block will run, before which other block.
The second block of code goes before the Output:
block in your config text file. In it, we specify the inputs for the custom algorithm, those properties set in the __init()__
function in its class declaration:
# Custom algorithm added above.
MyxAODAnalysis:
# Pass continers to use when defining variables
muons: 'AnaMuons.medium'
electrons: 'AnaElectrons.loose'
jets: 'AnaJets'
Run the code to see this algorithm in effect. If all works as expected (i.e., logging messages with MyxAODAnalysis
are being printed), commit and push your code changes.