Creating and Filling Histograms

Last update: 04 Dec 2024 [History] [Edit]

In a typical analysis workflow, you will want to process the information in an xAOD and write outputs to histograms or a TTree (known as an ntuple) for further analysis processing. In this section, you will be shown how to create and fill histograms using this framework. You will be shown how to make an ntuple in a later section.

There are several ways to manage histograms, but here we will show a way that allows you to manage histograms in the same way whether you are running your algorithm in AnalysisBase or AthAnalysis. Using a histogram is typically a two-step process: first you book the histogram indicating the name, binning, etc. then you fill the histogram. As an example, let’s use the eventNumber, which you can think of as an event index. Later in the tutorial, you will be asked to produce histograms for objects such as electrons and muons.

First, add the following include to MyxAODAnalysis.h:

#include <TH1.h>

tip Ordinarily, it would be better to put this in MyxAODAnalysis.cxx, but we will need it in the header file later, so we may as well put it there now. Always try to #include only what’s needed, and only where it’s needed: if you only reference a class in your source file, the include can go there.

Booking should happen in the initialize() function of your algorithm, in the source code (MyxAODAnalysis.cxx), though in exceptional cases you may want to delay this to the processing of the first event. (In case the creation of the histogram depends on what sort of sample you are processing.) Either way you add:

  // Book run number histogram
  ANA_CHECK (book (TH1F ("runNumber", "runNumber", 10, 284495, 284505)));

  // Book event number histogram
  ANA_CHECK (book (TH1F ("eventNumber", "eventNumber", 100, 394.445e6, 394.455e6)));

tip Note that the binning defined here is very specific to the input file. This is meant to simply be an example of how to book and fill histograms. In an analysis scenario, you likely wouldn’t be filling eventNumber or runNumber histograms and therefore wouldn’t need to fine tune the binning like this.

Whenever possible, you should put this in initialize() because it is called before processing any events and makes sure that the histogram gets created even if we don’t process any events.

Now, you need to access the run number and event number, which are stored in the EventInfo. Let’s store them as member variables before filling the histograms. First, define two private variables in MyxAODAnalysis.h:

  // Event identifier variables
  unsigned int m_runNumber = 0; //< Run number
  unsigned long long m_eventNumber = 0; //< Event number

In the last section we learned to retrieve the EventInfo object. Now that we have that object, we can set the member variables that we just defined in the execute() method of the source code:

  // Access run number and event number and store as local variables
  m_runNumber = eventInfo->runNumber ();
  m_eventNumber = eventInfo->eventNumber ();

Next, still inside execute(), let’s fill the histograms with the eventNumber information we’ve already retrieved:

  // Fill run number and event number histograms
  hist ("runNumber")->Fill (m_runNumber);
  hist ("eventNumber")->Fill (m_eventNumber);

That is it on the c++ side. Now you can compile and run your code (in case you need a reminder).

Running and locating output histograms

To now successfully run your updated code, you need to do slightly different things depending on whether you are using EventLoop or Athena. In the following we describe these differences.

Outputs in EventLoop

EventLoop creates a file automatically for your output histograms, you don’t need to modify your job submission script at all to run your histogram writing algorithm.

Once your job is finished you can find the histogram(s) inside the unique directory you created at job submission (submitDir). There is a different file for each sample you submitted (hist-label.root), so in our case we have only one submitDir/hist-dataset. Please note that the second part of the histogram file name will correspond directly to the name of the sample in SampleHandler, while the first part (hist-) is set by EventLoop and cannot be changed.

tip Remember that you set the sample in your job configuration:

# Use SampleHandler to get the sample from the defined location
sample = ROOT.SH.SampleLocal("dataset")

tip Note that there is another output file in the hist directory called hist/dataset.root. This is not the file that contains your output histograms. We’ll come back to that file later.

Outputs in Athena

You need to tell the THistSvc service which file to write for the ANALYSIS stream. You will find the following in your jobOptions doing this (see main Athena tutorial for more info):

jps.AthenaCommonFlags.HistOutputs = ["ANALYSIS:MyxAODAnalysis.outputs.root"]
svcMgr.THistSvc.MaxFileSize=-1 #speeds up jobs that output lots of histograms

tip Trees and histograms follow the same formalism for output. In both cases, you can assign a different stream name to your algorithm and create the output file with a different stream name, allowing you to create multiple files from multiple algorithms.

Check, commit and push your changes

Check both histograms to make sure they were created and filled correctly.

If you are happy that the histograms are being filled properly, commit and push your changes.