Create a Custom CP Algorithm

Last update: 11 Mar 2024 [History] [Edit]

Create and Add a Custom Algorithm to a CP Analysis Job

All sections in the software tutorial have dealt exclusively with configuring a .yaml file to set up and run the analysis. The text added to the file called for algorithms already implemented in the CP code to perform common ntupling functions.

If we want the analysis to perform a function not readily available in the CP algorithms, then we must create a custom algorithm and add it to the config file. For this example, we will use the additional configuration file, config_custom_algo.yaml, provided in the directory. (Of course, when it comes time to run the algorithm, the command changes to ATestRun_eljob.py ../source/MyAnalysis/data/config_custom_algo.yaml --submission-dir=submitDir.) Our example algorithm contains the number of electrons, jets, and muons per event, along with the dilepton mass and dijet mass (known as “mll” and “mjj,” respectively).

Schedule algorithm to calculate number of electrons, muons, and jets, along with dilepton mass and dijet mass

We want to create a branches in our output that contains the number of electrons, the number of muons, the number of jets, the dilepton mass, and dijet mass. To do so, we must specify a custom algorithm.

Let’s start with the python part of the algorithm. In the file MyxAODAnalysisConfig.py, we will define a class to call the algorithm. This class will be initialized with a muon container and will call the algorithm that stores the muon rest mass. Add the following lines of code to this python file:

  def __init__(self):
    super(MyxAODAnalysisConfig, self).__init__('MyxAODAnalysisConfig')
    self.addOption('electrons', '', type=str)
    self.addOption('muons', '', type=str)
    self.addOption('jets', '', type=str)
    #self.addOption('met', '', type=str)

  def makeAlgs(self, config):
    alg = config.createAlgorithm('MyxAODAnalysis', 'MyxAODAnalysis')
    alg.electrons, alg.electronSelection = config.readNameAndSelection(self.electrons)
    alg.muons, alg.muonSelection = config.readNameAndSelection(self.muons)
    alg.jets, alg.jetSelection = config.readNameAndSelection(self.jets)
    #alg.met = config.readName(self.met)

    config.addOutputVar('EventInfo', 'mll_%SYS%', 'mll')
    config.addOutputVar('EventInfo', 'mjj_%SYS%', 'mjj')
    config.addOutputVar('EventInfo', 'jet_n_%SYS%', 'jet_n')
    config.addOutputVar('EventInfo', 'el_n_%SYS%', 'el_n')
    config.addOutputVar('EventInfo', 'mu_n_%SYS%', 'mu_n')

The __init__() function runs when the class is instantiated automatically, and it defines the member varaibles electrons, muons, and jets, which are the electron container, muon container, and jet container, with their respective selection names, specified by the user. The makeAlgs() function calls the algorithm with the three containers and selection info as its data. The final lines augment the output with new variables, the quantities we wish to enumerate and calculate.

We must add this configuration to the makeSequence() function in MyAnalysisAlgorithms.py if we want the job to run the algorithm. Put the following lines in makeSequence() after the TextConfig() call:

    # add MyxAODAnalysis to config factory object
    from MyAnalysis.MyxAODAnalysisConfig import MyxAODAnalysisConfig
    config.addAlgConfigBlock(algName='MyxAODAnalysis',
        alg=MyxAODAnalysisConfig, pos='Output')

Configure the custom algorithm to calculate desired quantities

Firstly, for the custom algorithm to be compiled, we must include a line in the CMakeLists.txt file at the top level of the MyAnalysis directory. Add the lines xAODMuon, xAODEgamma, and xAODJet to the LINK_LIBRARIES section of the file so that the electron, muon, and jet libraries are linked to these objects during compilation.

The CP Algorithms are object-oriented, so algorithms are treated as classes. Classes are organized so that their members and attributes are declared in a header file, saved as .h, and these attributes are defined in a root file, saved as .cxx. (There are other files necessary for classes to be compiled, which we will describe here, too.) The header files of an algorithm are stored in a directory that shares the name of its parent directory - in this case, for MyAnalysis, you will find MyxAODAnalysis.h in the MyAnalysis subdirectory. Also in this subdirectory is a file, MyAnalysisDict.h, that specifies all header files for which a dictionary must be made. (The root files are in the root subdirectory, so you will find MyxAODAnalysis.cxx there.)

The header file and directory

Let us start by defining the algorithm in MyxAODAnalysis.h. We need to include all relevant objects that are used in this file, so we add the following directives to the top of the file, after the #IF_NDEF.../#DEFINE lines:

// Algorithms and systematics
#include <AnaAlgorithm/AnaAlgorithm.h>
#include <SystematicsHandles/SysReadHandle.h>
#include <SystematicsHandles/SysWriteHandle.h>
#include <SelectionHelpers/SysReadSelectionHandle.h>
#include <SystematicsHandles/SysReadDecorHandle.h>
#include <SystematicsHandles/SysWriteDecorHandle.h>

// Containers
#include <xAODEgamma/ElectronContainer.h>
#include <xAODMuon/MuonContainer.h>
#include <xAODJet/JetContainer.h>
#include <xAODMissingET/MissingETContainer.h>
#include <xAODEventInfo/EventInfo.h>

// Others
#include <xAODBase/IParticleContainer.h>
#include <TLorentzVector.h>
#include <AthContainers/ConstDataVector.h>

Now, we want to specify the public members of the algorithm. Every class needs a public constructor, which sets up the class upon its instantiation (similar to __init()__ in a python class) with its proper inputs, and a destructor, which uninstantiates the class and cleans up memory. Include the following lines in your code to create these attributes:

public:
  // Algorithm constructor
  MyxAODAnalysis (const std::string& name, ISvcLocator *pSvcLocator);

  // Destructor
  ~MyxAODAnalysis ();

The other public attributes are three functions that are conventional for analysis algorithms: initialize(), execute(), and finalize(). The initialize() method runs when the analysis starts, execute() runs as EventLoop loops through the events of the AOD file, and finalize() runs as the analysis closes. Add these methods by copying the following lines in the header:

  // These are the functions inherited from Algorithm
  virtual StatusCode initialize () override;
  virtual StatusCode execute () override;
  virtual StatusCode finalize () override;

The rest of the class will contain objects we do not want the user to deal with directly, so these attributes will be private. These attributes include the systematics, the handles and selections for the electrons, muons, and jets, the EventInfo handle, and the decorators for the numbers of electrons, muons, and jets, the dilepton masses, and dijet masses. Include the following lines in the header file beneath the public members:

private:
  // Systematics
  CP::SysListHandle m_systematicsList{this};

  // Input containers needed for variable definitions
  // Electrons
  CP::SysReadHandle<xAOD::ElectronContainer> m_electronsHandle{
      this, "electrons", "", "the electron container to use"};
  CP::SysReadSelectionHandle m_electronSelection{
      this, "electronSelection", "", "the selection on the input electrons"};
  // Muons
  CP::SysReadHandle<xAOD::MuonContainer> m_muonsHandle{
      this, "muons", "", "the muon container to use"};
  CP::SysReadSelectionHandle m_muonSelection{
      this, "muonSelection", "", "the selection on the input muons"};
  // Jets
  CP::SysReadHandle<xAOD::JetContainer> m_jetsHandle{
      this, "jets", "", "the jet container to use"};
  CP::SysReadSelectionHandle m_jetSelection{
      this, "jetSelection", "", "the selection on the input jets"};
  // MET
  //CP::SysReadHandle<xAOD::MissingETContainer> m_metHandle{
  //    this, "met", "", "the MET container to use"};

  // EventInfo
  CP::SysReadHandle<xAOD::EventInfo> m_eventInfoHandle{
      this, "eventInfo", "EventInfo", "the EventInfo container to read selection decisions from"};

  // Variables to be added to output
  CP::SysWriteDecorHandle<int> m_mu_n_decor {
    this, "mu_n", "mu_n_%SYS%", "Number of muons"
  };
  CP::SysWriteDecorHandle<int> m_el_n_decor {
    this, "el_n", "el_n_%SYS%", "Number of electrons"
  };
  CP::SysWriteDecorHandle<int> m_jet_n_decor {
    this, "jet_n", "jet_n_%SYS%", "Number of jets"
  };
  CP::SysWriteDecorHandle<float> m_mll_decor {
    this, "mll", "mll_%SYS%", "Dilepton mass of the two leading leptons"
  };
  CP::SysWriteDecorHandle<float> m_mjj_decor {
    this, "mjj", "mjj_%SYS%", "Dijet mass of the two leading jets"
  };

Note that we are keeping commented code for if we ever want to use missing transverse energy in the algorithm. It is kept private, too.

Note: While we are still in the MyAnalysis subdirectory, we must add the dictionary of this class to the MyAnalysisDict.h file by including this line in it:

// This file includes all the header files that you need to create
// dictionaries for.

#include <MyAnalysis/MyxAODAnalysis.h>

In addition, there is a selection.xml file in this directory that must include this library for the algorithm to compile. Include the following line in that file:

  <!-- This file contains a list of all classes for which a dictionary
       should be created. -->

  <class name="MyxAODAnalysis" />

The Root file

Now, let’s define all attributes declared in the class header. Firstly, we must include the header file for compilation. We specify this in the following line at the top of the MyxAODAnalysis.cxx in the root directory:

#include <MyAnalysis/MyxAODAnalysis.h>

In Root/MyxAODAnalysis.cxx, we must now explicitly define each method, starting with the constructor. We have no specific actions in the constructor to set, so we can take a lesson from the following comment left in the constructor:

MyxAODAnalysis :: MyxAODAnalysis (const std::string& name,
                                  ISvcLocator *pSvcLocator)
    : EL::AnaAlgorithm (name, pSvcLocator)
{
  // Here you put any code for the base initialization of variables,
  // e.g. initialize all pointers to 0. This is also where you
  // declare all properties for your algorithm. Note that things like
  // resetting statistics variables or booking histograms should
  // rather go into the initialize() function.
}

Now, we will load/allocate memory for all relevant handles and objects (the private attributes of the class) to be used during the event looping (i.e. actual selection to be carried out in execute()) in the initialize() method. Do so by adding the following lines to the initialize() method:

  // Here you do everything that needs to be done at the very
  // beginning on each worker node, e.g. create histograms and output
  // trees. This method gets called before any input files are
  // connected.

  ANA_MSG_INFO ("Initializing MyxAODAnalysis " << name());

  // input handles
  // Electrons
  ANA_CHECK(m_electronsHandle.initialize(m_systematicsList));
  ANA_CHECK(m_electronSelection.initialize(m_systematicsList, m_electronsHandle, SG::AllowEmpty));
  // Muons
  ANA_CHECK(m_muonsHandle.initialize(m_systematicsList));
  ANA_CHECK(m_muonSelection.initialize(m_systematicsList, m_muonsHandle, SG::AllowEmpty));
  // Jets
  ANA_CHECK(m_jetsHandle.initialize(m_systematicsList));
  ANA_CHECK(m_jetSelection.initialize(m_systematicsList, m_jetsHandle, SG::AllowEmpty));
  // MissingET
  //ANA_CHECK(m_metHandle.initialize(m_systematicsList));
  // EventInfo
  ANA_CHECK(m_eventInfoHandle.initialize(m_systematicsList));

  // output handles
  ANA_CHECK(m_mll_decor.initialize(m_systematicsList, m_eventInfoHandle));
  ANA_CHECK(m_mjj_decor.initialize(m_systematicsList, m_eventInfoHandle));
  ANA_CHECK(m_jet_n_decor.initialize(m_systematicsList, m_eventInfoHandle));
  ANA_CHECK(m_el_n_decor.initialize(m_systematicsList, m_eventInfoHandle));
  ANA_CHECK(m_mu_n_decor.initialize(m_systematicsList, m_eventInfoHandle));

  // Systematic list
  ANA_CHECK(m_systematicsList.initialize());

  return StatusCode::SUCCESS;

tip The method uses ANA_MSG_INFO() for print statements and ANA_CHECK() when loading objects. Both methods are preferred when running an analysis algorithm because: 1. The ANA_MSG_XXXXX() family is preferred to toggle print statements between info, debug, warning, and failure statuses, and

  1. Running ANA_CHECK() exits the code on failure upon a failed function execution.

tip The initialize(), execute(), and finalize() methods all return a StatusCode, the preferred flag over a standard boolean for the CP Algorithms (SUCCESS “maps” to a 0, as in return 0;, FAILURE maps to a 1, as in exit 1;).

We will now populate the execute() method to select and store the electron count, muon count, jet count, dilepton mass, and dijet mass. To do so, we retrieve the event info of the output storage, then loop over all systematics that may have been specified in the analysis, then load the event info for each event, and retrieve the electrons, muons, and jets.

We then declare vectors for the selected electrons, muons, and jets. When looping over all muons, we store up to two selected muons, all while increasing the count of the selected muons vector. We do the same for the electrons, while we simply store the count of the jets. We then count the number of objects in each of the three containers and calculate the mass of the two leptons and two jets (if there are two leptons and two jets). Finally, we set the output variables with the object counts and calculated masses before returning gracefully.

Copy the following lines of code into the execute() method to accomplish these tasks:

  // Here you do everything that needs to be done on every single
  // events, e.g. read input variables, apply cuts, and fill
  // histograms and trees. This is where most of your actual analysis
  // code will go.

  ANA_MSG_INFO ("Executing MyxAODAnalysis " << name());

  // Retrieve the eventInfo object from the event store
  const xAOD::EventInfo *eventInfo = nullptr;
  ANA_CHECK (evtStore()->retrieve (eventInfo, "EventInfo"));

  // Print out run and event number from retrieved object
  ANA_MSG_INFO ("in execute, runNumber = " << eventInfo->runNumber() << ", eventNumber = " << eventInfo->eventNumber());

  for (const auto &sys : m_systematicsList.systematicsVector())
  {
    // retrieve the EventInfo
    const xAOD::EventInfo *evtInfo = nullptr;
    ANA_CHECK(m_eventInfoHandle.retrieve(evtInfo, sys));

    // retrieve objects
    const xAOD::ElectronContainer *electrons = nullptr;
    ANA_CHECK(m_electronsHandle.retrieve(electrons, sys));
    const xAOD::MuonContainer *muons = nullptr;
    ANA_CHECK(m_muonsHandle.retrieve(muons, sys));
    const xAOD::JetContainer *jets = nullptr;
    ANA_CHECK(m_jetsHandle.retrieve(jets, sys));
    //const xAOD::MissingETContainer *met = nullptr;
    //ANA_CHECK(m_metHandle.retrieve(met, sys));

    // apply object-wise selection
    ConstDataVector<xAOD::ElectronContainer> selected_electrons(SG::VIEW_ELEMENTS);
    ConstDataVector<xAOD::MuonContainer> selected_muons(SG::VIEW_ELEMENTS);
    ConstDataVector<xAOD::JetContainer> selected_jets(SG::VIEW_ELEMENTS);

    // declare pointers for place holder particles
    const xAOD::IParticle *lep1 = 0, *lep2 = 0;

    // apply object selection
    // Muons
    for (const xAOD::Muon *mu : *muons)
    {
      if (m_muonSelection.getBool(*mu, sys)){
        selected_muons.push_back(mu);
        if (!lep1 || mu->pt() > lep1->pt()){
          lep2 = lep1;
          lep1 = mu;
        } else if (!lep2 || mu->pt() > lep2->pt()){
          lep2 = mu;
        }
      }
    }

    // Electrons
    for (const xAOD::Electron *el : *electrons)
    {
      if (m_electronSelection.getBool(*el, sys)){
        selected_electrons.push_back(el);
        if (!lep1 || el->pt() > lep1->pt()){
          lep2 = lep1;
          lep1 = el;
        } else if (!lep2 || el->pt() > lep2->pt()){
          lep2 = el;
        }
      }
    }

    // Jets
    for (const xAOD::Jet *jet : *jets)
    {
      if (m_jetSelection.getBool(*jet, sys)){
        selected_jets.push_back(jet);
      }
    }

    // count objects
    int mu_n = selected_muons.size();
    int el_n = selected_electrons.size();
    int jet_n = selected_jets.size();

    // get mll
    float mll = 0.0;
    if (mu_n + el_n > 1)
      // will crash if lep1 or lep2 were not set 
      mll = (lep1->p4() + lep2->p4()).M();

    // get mjj
    float mjj = 0.0;
    if (jet_n > 1)
      // will crash if jet1 or jet2 were not set 
      mjj = (selected_jets.at(0)->p4() + selected_jets.at(1)->p4()).M();

    // set variables
    m_mu_n_decor.set(*evtInfo, mu_n, sys);
    m_el_n_decor.set(*evtInfo, el_n, sys);
    m_jet_n_decor.set(*evtInfo, jet_n, sys);
    m_mll_decor.set(*evtInfo, mll, sys);
    m_mjj_decor.set(*evtInfo, mjj, sys);
  }

  return StatusCode::SUCCESS;

Note that, again, we have included commented code for if we ever want to do something with the missing transverse energy. In addition, the algorithm will print out the run number and event number each time it is called on a new event for debugging purposes.

Now, we populate the finalize() method for this algorithm with a simple lesson in a comment, along with its proper return:

  // This method is the mirror image of initialize(), meaning it gets
  // called after the last event has been processed on the worker node
  // and allows you to finish up any objects you created in
  // initialize() before they are written to disk. This is actually
  // fairly rare, since this happens separately for each worker node.
  // Most of the time you want to do your post-processing on the
  // submission node after all your histogram outputs have been
  // merged.
  return StatusCode::SUCCESS;

The src/components file

There is one more task to be completed to define the algorithm fully. We need the C++ algorithm to be recognized in Python when calling it. We do so by adding it as a “component” to the software build. In the src/components subdirectories, there is a file called MyAnalysis_entries.cxx. Here, we must add the following lines for the proper include and to declare the algorithm for compilation:

#include <MyAnalysis/MyxAODAnalysis.h>

DECLARE_COMPONENT (MyxAODAnalysis)

With this file populated, the algorithm has been fully defined! We can now include it in our text configuration.

Adding the custom algorithm to the text configuration

Since the algorithm has been defined to store the muon rest mass for each event already, and since the algorithm has been scheduled to run in MyAnalysisAlgorithms.py, to include the muon mass branch in the final ntuple, we must include two new blocks of code. The first goes at the tope of config_custom_algo.yaml:

# Added custom algorithms. Expects a list 
AddConfigBlocks:
    # Calls `import functionName from modulePath`
    modulePath: 'MyAnalysis.MyxAODAnalysisConfig'
    functionName: 'MyxAODAnalysisConfig'
    # Name that will be used in YAML file
    algName: 'MyxAODAnalysis'
    # Will run before 'Output' algorithm
    pos: 'Output'

This block of text calls a config block specifically designed to add additional algorithms not already part of the CP Algorithms code to a job. The modulePath and functionName point to the location of the algorithm in its python file and of the file in its directory. The algName gives a unique name to the algorithm so that it can be tracked in the output logging when the job runs. The pos specifies where in the list of algorithms the block will run, before which other block.

The second block of code goes before the Output: block in the config_custom_algo.yaml file. In it, we specify the inputs for the custom algorithm, those properties set in the __init()__ function in its class declaration:

# Custom algorithm added above.
MyxAODAnalysis:
  # Pass continers to use when defining variables
  muons: 'AnaMuons.medium'
  electrons: 'AnaElectrons.loose'
  jets: 'AnaJets'

Rerun the code to see this algorithm in effect. If all works as expected (i.e., logging messages with MyxAODAnalysis are being printed), commit and push your code changes.