Creating and Filling Trees/NTuples

Last update: 21 Jul 2021 [History] [Edit]

Beside writing histograms, creating TTree (ntuple) outputs is the most common thing for an analysis algorithm. In this section we will show you how to create a TTree in your analysis algorithm, and write some simple variables to it. If you are not familiar with the interface of TTree, you should probably first look at the following pages and examples:

Note that the infrastructure allows you to create as many trees and output files as you want in your analysis. Keep this in mind especially when dealing with systematic variations.

The EL::AnaAlgorithm class provides a very similar interface for handling trees as for handling histograms. If you have not done so yet, you should probably read the section about handling histograms first.

warning These instructions for writing trees only work with analysis releases >=21.2.35.

CMake Configuration

In the exercise we will be filling information into our output tree from xAOD::EventInfo and xAOD::JetContainer. If you have not started using such objects yet, make sure that you now update your package’s CMakeLists.txt file to link against the libraries providing these classes.

  LINK_LIBRARIES [...] xAODEventInfo xAODJet

, where [...] is any other package dependencies you may already have included.

C++ Code

In order to write variables into an output TTree, you need to declare them as member variables in your algorithm class. For this exercise add the following members to your algorithm header (MyxAODAnalysis.h):

#include <TTree.h>
#include <vector>
...
  ~MyxAODAnalysis ();
...
  /// output variables for the current event
  /// \{
  unsigned int m_runNumber = 0; ///< Run number
  unsigned long long m_eventNumber = 0; ///< Event number
  /// Jet 4-momentum variables
  std::vector<float> *m_jetEta = nullptr;
  std::vector<float> *m_jetPhi = nullptr;
  std::vector<float> *m_jetPt = nullptr;
  std::vector<float> *m_jetE = nullptr;
  /// \}

The setup of the tree happens in the initialize() function of your algorithm. To set the tree up, add the following to the initialisation function:

  ANA_CHECK (book (TTree ("analysis", "My analysis ntuple")));
  TTree* mytree = tree ("analysis");
  mytree->Branch ("RunNumber", &m_runNumber);
  mytree->Branch ("EventNumber", &m_eventNumber);
  m_jetEta = new std::vector<float>();
  mytree->Branch ("JetEta", &m_jetEta);
  m_jetPhi = new std::vector<float>();
  mytree->Branch ("JetPhi", &m_jetPhi);
  m_jetPt = new std::vector<float>();
  mytree->Branch ("JetPt", &m_jetPt);
  m_jetE = new std::vector<float>();
  mytree->Branch ("JetE", &m_jetE);

And this shows us one of the most inconvenient features of writing a TTree. When you write a “primitive” and an “object” variable, you have to use an ever so slightly different formalism. For primitive variables, like the ones we will be getting from xAOD::EventInfo, we need to provide the TTree::Branch function with a pointer to the primitive variable. While for the vector/object variables we need to provide TTree::Branch with a pointer to the pointer that holds the variable. Which unfortunately makes the memory management of object variables very error prone.

To make sure that we don’t leak the memory belonging to these variables, please add the following into the destructor of your algorithm:

MyxAODAnalysis :: ~MyxAODAnalysis () {
  delete m_jetEta;
  delete m_jetPhi;
  delete m_jetPt;
  delete m_jetE;
}

There are of course possibilities for using smart pointers to make the code a little safer, but that would include some even more tricky pieces of code, and hence we leave that to the reader to experiment with.

Finally, once the creation and deletion of the tree/variables is taken care of, let’s fill them in the execute() function of the algorithm like:

#include <xAODEventInfo/EventInfo.h>
#include <xAODJet/JetContainer.h>
...
  // Read/fill the EventInfo variables:
  const xAOD::EventInfo* ei = nullptr;
  ANA_CHECK (evtStore()->retrieve (ei, "EventInfo"));
  m_runNumber = ei->runNumber ();
  m_eventNumber = ei->eventNumber ();
  // Read/fill the jet variables:
  const xAOD::JetContainer* jets = nullptr;
  ANA_CHECK (evtStore()->retrieve (jets, "AntiKt4EMPFlowJets"));
  m_jetEta->clear();
  m_jetPhi->clear();
  m_jetPt->clear();
  m_jetE->clear();
  for (const xAOD::Jet* jet : *jets) {
    m_jetEta->push_back (jet->eta ());
    m_jetPhi->push_back (jet->phi ());
    m_jetPt-> push_back (jet->pt ());
    m_jetE->  push_back (jet->e ());
  }
  // Fill the event into the tree:
  tree ("analysis")->Fill ();

This concludes the updates in the C++ code, you should be able to (re-)compile your code with all these changes included.

Job Configuration

As stated earlier, trees are handled very similarly to histograms. The difference with trees is that both EventLoop and Athena needs to be told explicitly where the tree should be written.

Athena Job Configuration

The formalism is exactly the same as for histograms. You need to tell the THistSvc service which file to write for the ANALYSIS stream. You can just add the following to your jobOptions to do this (see main athena tutorial for more info):

jps.AthenaCommonFlags.HistOutputs = ["ANALYSIS:MyxAODAnalysis.outputs.root"]
svcMgr.THistSvc.MaxFileSize=-1 #speeds up jobs that output lots of histograms

Just as for histograms, you can assign a different stream name to your algorithm, and create the output file with a different stream name. Allowing you to create multiple files from multiple algorithms.

EventLoop Job Configuration

Contrary to histogram handling, EventLoop does not assign trees automatically to an output file. Just like for Athena, you have to tell EventLoop explicitly that you want to create a “stream”, and write trees into it.

You do this by adding the following into your job submission script:

job.outputAdd (ROOT.EL.OutputStream ('ANALYSIS'))

Just as shown in the histogram section, the most reliable way of accessing the created file/tree is using SH::SampleHandler. Like:

sh_hist = ROOT.SH.SampleHandler()
sh_hist.load (options.submission_dir + '/output-ANALYSIS')

But often it may be more practical to just find the created file directly. For instance if you run your code using EL::DirectDriver, you can find your output file under submitDir/data-ANALYSIS.