Create a Custom CP Algorithm

Last update: 07 Oct 2024 [History] [Edit]

Create and Add a Custom Algorithm to a CP Analysis Job

The CP Algorithms Tutorial introduces only the most commonly-used algorithms in an analysis. These algorithms perform only basic and commonly-used ntupling functions with the xAOD data.

If we want the analysis to perform a function not readily available in the CP algorithms, then we must create a custom algorithm and add it to the config file.

The steering macro (i.e. ATestRun_eljob.py) that was used in the conventional tutorial will not need to be edited to run the custom algorithm. Instead, only the text configuration file, along with many under-the-hood files pertaining to the block configuration and the new algorithm itself, will be edited to perform our desired functions.

Our example algorithm contains the number of electrons, jets, and muons per event, along with the dilepton mass and dijet mass (known as “mll” and “mjj,” respectively).

We will start by creating the algorithm, then by building the python block that “wraps” it, then by adding that block to the configuration file, then finishing finally by including that algorithm in the text configuration.

Create the Algorithm block to calculate number of electrons, muons, and jets, along with dilepton mass and dijet mass

We want to create branches in our output that contains the number of electrons, the number of muons, the number of jets, the dilepton mass, and dijet mass. To do so, we must specify a custom algorithm, the implementation of which is spread out across multiple files.

Configure the custom algorithm to calculate desired quantities

Firstly, for the custom algorithm to be compiled, we must include a line in the CMakeLists.txt file at the top level of the MyAnalysis directory. Add the lines xAODMuon, xAODEgamma, xAODJet, SystematicsHandlesLib, and SelectionHelpersLib to the LINK_LIBRARIES section of the file so that the electron, muon, and jet libraries, along with the systematics handles and selection helpers libraries, are linked to these objects during compilation.

The CP Algorithms are object-oriented, so algorithms are treated as classes. Classes are organized so that their members and attributes are declared in a header file, saved as .h, and these attributes are defined in a source file, saved as .cxx. (There are other files necessary for classes to be compiled, which we will describe here, too.) The header files of an algorithm are stored in a directory that shares the name of its parent directory - in this case, for MyAnalysis, you will find MyxAODAnalysis.h in the MyAnalysis subdirectory. Also in this subdirectory is a file, MyAnalysisDict.h, that specifies all header files for which a dictionary must be made. (The source files are in the root subdirectory, so you will find MyxAODAnalysis.cxx there.)

The header file and directory

We start by defining the algorithm in MyxAODAnalysis.h. First comes the pragma that directs the compiler to create the object. At the top of the file, add the following lines:

#ifndef MyAnalysis_MyxAODAnalysis_H
#define MyAnalysis_MyxAODAnalysis_H


#endif

The remaining contents of the header file will go between the #define and #endif lines of the file.

We need to include all relevant objects that are used in this file, so we add the following directives to the top of the file (after the #ifndef/#define lines):

// Algorithms and systematics
#include <AnaAlgorithm/AnaAlgorithm.h>
#include <SystematicsHandles/SysReadHandle.h>
#include <SystematicsHandles/SysWriteHandle.h>
#include <SelectionHelpers/SysReadSelectionHandle.h>
#include <SystematicsHandles/SysReadDecorHandle.h>
#include <SystematicsHandles/SysWriteDecorHandle.h>

// Containers
#include <xAODEgamma/ElectronContainer.h>
#include <xAODMuon/MuonContainer.h>
#include <xAODJet/JetContainer.h>
#include <xAODMissingET/MissingETContainer.h>
#include <xAODEventInfo/EventInfo.h>

// Others
#include <AthContainers/ConstDataVector.h>

With our necessary inclusions, we must now declare the class. Do so by copying this declaration into the file:

class MyxAODAnalysis : public EL::AnaAlgorithm
{

};

Everything that populates the class will go in between these curly braces. Note that the : public EL::AnaAlgorithm part of what you copied means that the MyxAODAnalysis class is built on top of the EL::AnaAlgorithm base class.

Now, we want to specify the public members of the algorithm. Every class needs a public constructor, which sets up the class upon its instantiation (similar to __init()__ in a python class) with its proper inputs, and a destructor, which uninstantiates the class and cleans up memory. Include the following lines in your code to create these attributes:

public:
  // Algorithm constructor
  MyxAODAnalysis (const std::string& name, ISvcLocator *pSvcLocator);

  // Destructor
  virtual ~MyxAODAnalysis ();

The other public attributes are three functions that are conventional for analysis algorithms: initialize(), execute(), and finalize(). The initialize() method runs when the analysis starts, execute() runs as EventLoop loops through the events of the AOD file, and finalize() runs as the analysis closes. Add these methods by copying the following lines in the header:

  // These are the functions inherited from Algorithm
  virtual StatusCode initialize () override;
  virtual StatusCode execute () override;
  virtual StatusCode finalize () override;

The rest of the class will contain objects we do not want the user to deal with directly, so these attributes will be private. These attributes include the systematics, the handles and selections for the electrons, muons, and jets, the EventInfo handle, and the decorators for the numbers of electrons, muons, and jets, the dilepton masses, and dijet masses. These attributes are assigned and specified through the algorithm block in python, to be completed in the next parts of the tutorial.

In the context of this code, a handle is a “proxy object,” used to define tools and containers without specifying exactly which tool or container to read in or out (which is left for when a full configuration of the analysis sequence is in place). For this analysis, four types of handles will be used: a SysListHandle, a SysReadHandle, a SysReadSelectionHandle, and a SysWriteDecorHandle.

The SysListHandle houses the list of systematics errors in the analysis. The SysReadHandle retrieves systematically-varied object collections, while the SysReadSelectionHandle reads the input systematically-varied selection cut. Finally, the SysWriteDecorHandle registers the new systematically-varied decoration applied in this algorithm. The workflow is then intuitive: retrieve the systematics, the objects, and their selections and variations per systematics, apply the algorithm and decorate the objects that are selected from the algorithm, then write those decorations out for future use.

Include the following lines in the header file beneath the public members:

private:
  // Systematics
  CP::SysListHandle m_systematicsList{this};

  // Input containers needed for variable definitions
  // Electrons
  CP::SysReadHandle<xAOD::ElectronContainer> m_electronsHandle{
      this, "electrons", "", "the electron container to use"};
  CP::SysReadSelectionHandle m_electronSelection{
      this, "electronSelection", "", "the selection on the input electrons"};
  // Muons
  CP::SysReadHandle<xAOD::MuonContainer> m_muonsHandle{
      this, "muons", "", "the muon container to use"};
  CP::SysReadSelectionHandle m_muonSelection{
      this, "muonSelection", "", "the selection on the input muons"};
  // Jets
  CP::SysReadHandle<xAOD::JetContainer> m_jetsHandle{
      this, "jets", "", "the jet container to use"};
  CP::SysReadSelectionHandle m_jetSelection{
      this, "jetSelection", "", "the selection on the input jets"};
  // MET
  //CP::SysReadHandle<xAOD::MissingETContainer> m_metHandle{
  //    this, "met", "", "the MET container to use"};

  // EventInfo
  CP::SysReadHandle<xAOD::EventInfo> m_eventInfoHandle{
      this, "eventInfo", "EventInfo", "the EventInfo container to read selection decisions from"};

  // Variables to be added to output
  CP::SysWriteDecorHandle<int> m_muNDecor {
    this, "mu_n", "mu_n_%SYS%", "Number of muons"
  };
  CP::SysWriteDecorHandle<int> m_elNDecor {
    this, "el_n", "el_n_%SYS%", "Number of electrons"
  };
  CP::SysWriteDecorHandle<int> m_jetNDecor {
    this, "jet_n", "jet_n_%SYS%", "Number of jets"
  };
  CP::SysWriteDecorHandle<float> m_mllDecor {
    this, "mll", "mll_%SYS%", "Dilepton mass of the two leading leptons"
  };
  CP::SysWriteDecorHandle<float> m_mjjDecor {
    this, "mjj", "mjj_%SYS%", "Dijet mass of the two leading jets"
  };

SysReadHandles and SysReadSelectionHandles are created for the electrons, muons, and jets, and for their already-applied selections. A SysReadHandle is created for the eventInfo, which stores all selection information for each object, along with run numbers and event numbers. SysWriteDecorHandles are created for each of the quantities calculated by the algorithm. Note that we are keeping commented code for if we ever want to use missing transverse energy in the algorithm. It is kept private, too.

Note: While we are still in the MyAnalysis subdirectory, we must add the dictionary of this class to the MyAnalysisDict.h file by including this line in it:

// This file includes all the header files that you need to create
// dictionaries for.

#include <MyAnalysis/MyxAODAnalysis.h>

In addition, there is a selection.xml file in this directory that must include this library for the algorithm to compile. Include the following line in that file:

  <!-- This file contains a list of all classes for which a dictionary
       should be created. -->

  <class name="MyxAODAnalysis" />

The Source file

Now, let’s define all attributes declared in the class header. Firstly, we must include the header file for compilation. We specify this in the following line at the top of the MyxAODAnalysis.cxx in the Root directory:

#include <MyAnalysis/MyxAODAnalysis.h>

In Root/MyxAODAnalysis.cxx, we must now explicitly define each method, starting with the constructor. We have no specific actions in the constructor to set, so we can take a lesson from the following comment left in the constructor:

MyxAODAnalysis :: MyxAODAnalysis (const std::string& name,
                                  ISvcLocator *pSvcLocator)
    : EL::AnaAlgorithm (name, pSvcLocator)
{
  // Here you put any code for the base initialization of variables,
  // e.g. initialize all pointers to 0. This is also where you
  // declare all properties for your algorithm. Note that things like
  // resetting statistics variables or booking histograms should
  // rather go into the initialize() function.
}

Similarly, we have nothing to define in the destructor, thus we can define it as:

MyxAODAnalysis::~MyxAODAnalysis(){
  // Here you can delete any pointers or clear any memory
  // no longer needed after the algorithm has run.
}

Now, we will load/allocate memory for all relevant handles and objects (the private attributes of the class) to be used during the event looping (i.e. actual selection to be carried out in execute()) in the initialize() method. Do so by creating the initialize() method:

StatusCode MyxAODAnalysis::initialize(){
  // Here you do everything that needs to be done at the very
  // beginning on each worker node, e.g. create histograms and output
  // trees. This method gets called before any input files are
  // connected.

  ANA_MSG_INFO ("Initializing MyxAODAnalysis " << name());

  // input handles
  // Electrons
  ANA_CHECK(m_electronsHandle.initialize(m_systematicsList));
  ANA_CHECK(m_electronSelection.initialize(m_systematicsList, m_electronsHandle, SG::AllowEmpty));
  // Muons
  ANA_CHECK(m_muonsHandle.initialize(m_systematicsList));
  ANA_CHECK(m_muonSelection.initialize(m_systematicsList, m_muonsHandle, SG::AllowEmpty));
  // Jets
  ANA_CHECK(m_jetsHandle.initialize(m_systematicsList));
  ANA_CHECK(m_jetSelection.initialize(m_systematicsList, m_jetsHandle, SG::AllowEmpty));
  // MissingET
  //ANA_CHECK(m_metHandle.initialize(m_systematicsList));
  // EventInfo
  ANA_CHECK(m_eventInfoHandle.initialize(m_systematicsList));

  // output handles
  ANA_CHECK(m_mllDecor.initialize(m_systematicsList, m_eventInfoHandle));
  ANA_CHECK(m_mjjDecor.initialize(m_systematicsList, m_eventInfoHandle));
  ANA_CHECK(m_jetNDecor.initialize(m_systematicsList, m_eventInfoHandle));
  ANA_CHECK(m_elNDecor.initialize(m_systematicsList, m_eventInfoHandle));
  ANA_CHECK(m_muNDecor.initialize(m_systematicsList, m_eventInfoHandle));

  // Systematic list
  ANA_CHECK(m_systematicsList.initialize());

  return StatusCode::SUCCESS;
}

tip The method uses ANA_MSG_INFO() for print statements and ANA_CHECK() when loading objects. Both methods are preferred when running an analysis algorithm because: 1. The ANA_MSG_XXXXX() family is preferred to toggle print statements between info, debug, warning, and failure statuses, and

  1. Running ANA_CHECK() exits the code on failure upon a failed function execution.

tip The initialize(), execute(), and finalize() methods all return a StatusCode, the preferred flag over a standard boolean for the CP Algorithms (SUCCESS “maps” to a 0, as in return 0;, FAILURE maps to a 1, as in exit 1;).

In this method, each of the object handles are initialized to their respective object containers from the DAOD. If extra systematics are generated by these containers, then they are added to the systematics list.

The object selection handles are similarly initialized to their selections, linking to the systematics list and object containers in the process. The eventInfo handle is initialized, too, as are the output handles, adding their decorations to the eventInfo in the process.

Once everything is initialized, the systematics list is then set, so it is finalized by its own initialization method.

We will now populate the execute() method to select and store the electron count, muon count, jet count, dilepton mass, and dijet mass. To do so, we retrieve the event info of the output storage, then loop over all systematics that may have been specified in the analysis, then load the event info for each event, and retrieve the electrons, muons, and jets.

We then declare vectors for the selected electrons, muons, and jets. When looping over all muons, we store up to two selected muons, all while increasing the count of the selected muons vector. We do the same for the electrons, while we simply store the count of the jets. We then count the number of objects in each of the three containers and calculate the mass of the two leptons and two jets (if there are two leptons and two jets). Finally, we set the output variables with the object counts and calculated masses before returning gracefully.

Copy the following lines of code to create the execute() method to accomplish these tasks:

StatusCode MyxAODAnalysis::execute(){
  // Here you do everything that needs to be done on every single
  // events, e.g. read input variables, apply cuts, and fill
  // histograms and trees. This is where most of your actual analysis
  // code will go.

  ANA_MSG_INFO ("Executing MyxAODAnalysis " << name());

  // Retrieve the eventInfo object from the event store
  const xAOD::EventInfo *eventInfo = nullptr;
  ANA_CHECK (m_eventInfoHandle.retrieve (eventInfo, {}));

  // Print out run and event number from retrieved object
  ANA_MSG_INFO ("in execute, runNumber = " << eventInfo->runNumber() << ", eventNumber = " << eventInfo->eventNumber());

  for (const auto &sys : m_systematicsList.systematicsVector()){

    // retrieve the EventInfo
    const xAOD::EventInfo *evtInfo = nullptr;
    ANA_CHECK(m_eventInfoHandle.retrieve(evtInfo, sys));

    // retrieve objects
    const xAOD::ElectronContainer *electrons = nullptr;
    ANA_CHECK(m_electronsHandle.retrieve(electrons, sys));
    const xAOD::MuonContainer *muons = nullptr;
    ANA_CHECK(m_muonsHandle.retrieve(muons, sys));
    const xAOD::JetContainer *jets = nullptr;
    ANA_CHECK(m_jetsHandle.retrieve(jets, sys));
    //const xAOD::MissingETContainer *met = nullptr;
    //ANA_CHECK(m_metHandle.retrieve(met, sys));

    // apply object-wise selection
    /* NOTE HERE: The vector that will house the leptons for mll calculation is the 
    recommended vector to use when not many operations will be made on the particle
    quantities. The ConstDataVector used for the selected jets is recommended for 
    many operations to be carried out on the particles in the vector. Both types of
    vectors are instantiated here for demonstrative purposes. */
    std::vector<const xAOD::IParticle*> leptons;
    ConstDataVector<xAOD::JetContainer> selectedJets(SG::VIEW_ELEMENTS);

    // Muons
    int nMu = 0;
    for (const xAOD::Muon *mu : *muons){
      // This if statement is just to loop over selected muons that will appear in the output ntuple
      if (m_muonSelection.getBool(*mu, sys)){
        leptons.push_back(mu);
        nMu++;
      }
    }

    // Electrons
    int nEl = 0;
    for (const xAOD::Electron *el : *electrons){
      // This if statement is just to loop over selected electrons that will appear in the output ntuple
      if (m_electronSelection.getBool(*el, sys)){
        leptons.push_back(el);
        nEl++;
      }
    }

    // Jets
    int nJet = 0;
    for (const xAOD::Jet *jet : *jets){
      // This if statement is just to loop over selected jets that will appear in the output ntuple
      if (m_jetSelection.getBool(*jet, sys)){
        selectedJets.push_back(jet);
        nJet++;
      }
    }

    // Calculate the mll of the leading two leptons (sort by highest pT, then calculate the mass)
    float mll = 0.0;
    if (nMu + nEl > 1){
      std::sort(leptons.begin(), leptons.end(), [] (const xAOD::IParticle* lep1, const xAOD::IParticle* lep2) {return lep1->pt() > lep2->pt();});
      mll = (leptons.at(0)->p4() + leptons.at(1)->p4()).M();
    } 

    // get mjj
    float mjj = 0.0;
    if (nJet > 1){
      // will crash if jet1 or jet2 were not set 
      mjj = (selectedJets.at(0)->p4() + selectedJets.at(1)->p4()).M();
    }

    // set variables
    m_muNDecor.set(*evtInfo, nMu, sys);
    m_elNDecor.set(*evtInfo, nEl, sys);
    m_jetNDecor.set(*evtInfo, nJet, sys);
    m_mllDecor.set(*evtInfo, mll, sys);
    m_mjjDecor.set(*evtInfo, mjj, sys);
  }

  return StatusCode::SUCCESS;
}

Note that, again, we have included commented code for if we ever want to do something with the missing transverse energy. In addition, the algorithm will print out the run number and event number each time it is called on a new event for debugging purposes.

Now, we write the finalize() method for this algorithm with a simple lesson in a comment, along with its proper return:

StatusCode MyxAODAnalysis::finalize(){
  // This method is the mirror image of initialize(), meaning it gets
  // called after the last event has been processed on the worker node
  // and allows you to finish up any objects you created in
  // initialize() before they are written to disk. This is actually
  // fairly rare, since this happens separately for each worker node.
  // Most of the time you want to do your post-processing on the
  // submission node after all your histogram outputs have been
  // merged.
  return StatusCode::SUCCESS;
}

The src/components file

There is one more task to be completed to define the algorithm fully. We need the C++ algorithm to be recognized in Python when calling it. We do so by adding it as a “component” to the software build. In the src/components subdirectories, there is a file called MyAnalysis_entries.cxx. Here, we must add the following lines for the proper include and to declare the algorithm for compilation:

#include <MyAnalysis/MyxAODAnalysis.h>

DECLARE_COMPONENT (MyxAODAnalysis)

With this file populated, the algorithm has been fully defined! We can now include it in our block and text configurations.

Create the configuration block that calls the algorithm

In the file MyxAODAnalysisConfig.py, we will define a class to call the algorithm. This class is the configuration block, and it will be initialized with containers for the electrons, muons, and jets along with the algorithm that we just wrote. First, we must include imports standard for any CP algorithm at the top of the file:

# AnaAlgorithm import(s):
from AnalysisAlgorithmsConfig.ConfigBlock import ConfigBlock
from AnalysisAlgorithmsConfig.ConfigAccumulator import DataType
from AthenaConfiguration.Enums import LHCPeriod

Once these lines are included, add the following lines of code to this python file:

class MyxAODAnalysisConfig(ConfigBlock):

  def __init__(self):
    super(MyxAODAnalysisConfig, self).__init__('MyxAODAnalysisConfig')
    self.addOption('electrons', '', type=str)
    self.addOption('muons', '', type=str)
    self.addOption('jets', '', type=str)
    #self.addOption('met', '', type=str)

  def makeAlgs(self, config):
    alg = config.createAlgorithm('MyxAODAnalysis', 'MyxAODAnalysis')
    alg.electrons, alg.electronSelection = config.readNameAndSelection(self.electrons)
    alg.muons, alg.muonSelection = config.readNameAndSelection(self.muons)
    alg.jets, alg.jetSelection = config.readNameAndSelection(self.jets)
    #alg.met = config.readName(self.met)

    config.addOutputVar('EventInfo', 'mll_%SYS%', 'mll')
    config.addOutputVar('EventInfo', 'mjj_%SYS%', 'mjj')
    config.addOutputVar('EventInfo', 'jet_n_%SYS%', 'jet_n')
    config.addOutputVar('EventInfo', 'el_n_%SYS%', 'el_n')
    config.addOutputVar('EventInfo', 'mu_n_%SYS%', 'mu_n')

The __init__() function runs when the class is instantiated automatically, and it defines the member variables electrons, muons, and jets, which are the electron container, muon container, and jet container, with their respective selection names, specified by the user. The makeAlgs() function calls the algorithm with the three containers and selection info as its data. The final lines augment the output with new variables, the quantities we wish to enumerate and calculate.

Adding the custom algorithm to the text configuration

Since the algorithm has been defined to calculate the desired quantities for each event already, and since the text configuration has been scheduled to run in MyAnalysisAlgorithms.py, to include our new branches in the final ntuple, we must include two new blocks of code into our config text. The first goes at the top of your yaml file:

# Added custom algorithms. Expects a list 
AddConfigBlocks:
    # Calls `import functionName from modulePath`
    modulePath: 'MyAnalysis.MyxAODAnalysisConfig'
    functionName: 'MyxAODAnalysisConfig'
    # Name that will be used in YAML file
    algName: 'MyxAODAnalysis'
    # Will run before 'Output' algorithm
    pos: 'Output'

This block of text calls a config block specifically designed to add additional algorithms not already part of the CP Algorithms code to a job. The modulePath and functionName point to the location of the algorithm in its python file and of the file in its directory. The algName gives a unique name to the algorithm so that it can be tracked in the output logging when the job runs. The pos specifies where in the list of algorithms the block will run, before which other block.

The second block of code goes before the Output: block in your config text file. In it, we specify the inputs for the custom algorithm, those properties set in the __init()__ function in its class declaration:

# Custom algorithm added above.
MyxAODAnalysis:
  # Pass continers to use when defining variables
  muons: 'AnaMuons.medium'
  electrons: 'AnaElectrons.loose'
  jets: 'AnaJets'

Run the code to see this algorithm in effect. If all works as expected (i.e., logging messages with MyxAODAnalysis are being printed), commit and push your code changes.