Creating and Filling Histograms

Last update: 23 Jun 2022 [History] [Edit]

Usually you will want to store the information you extract into histograms, instead of printing them on the screen like we do in this tutorial. We mostly print them to the screen because it is easier to see what you are doing and there are fewer steps involved, but for anything halfway serious you have to either fill histograms or create n-tuples/mini-xAODs.

There are several ways to manage histograms, but here we will show a way that allows you to manage histograms in the same way whether you are running your algorithm in EventLoop or Athena. Using a histogram is typically a two-step process, first you book the histogram indicating the name, binning, etc. then you fill the histogram. As an example let’s loop over all the jets and store their transverse momentum (pT) in a histogram.

First, if you haven’t done so, let’s add the jet xAOD EDM package to the build of your package/library. By adding the following under atlas_add_library in your package’s CMakeLists.txt file:

   LINK_LIBRARIES [...] xAODJet

where [...] is any other package dependencies you may already have included.

Then, make sure you add the following include to MyxAODAnalysis.h:

#include <TH1.h>

As we will be accessing the jet container we need to include that header in our source code, MyxAODAnalysis.cxx:

#include <xAODJet/JetContainer.h>

Booking should happen in the initialize() function of your algorithm. Though in exceptional cases you may want to delay this to the processing of the first event. (In case the creation of the histogram depends on what sort of sample you are processing.) Either way you call:

  ANA_CHECK (book (TH1F ("h_jetPt", "h_jetPt", 100, 0, 500))); // jet pt [GeV]

You should prefer using the initialize() function, because that method is called before processing any events and makes sure that the histogram gets created even if we don’t process any events.

And then inside execute() let’s retrieve the jets and fill the histogram:

  // loop over the jets in the container
  const xAOD::JetContainer* jets = nullptr;
  ANA_CHECK (evtStore()->retrieve (jets, "AntiKt4EMPFlowJets"));
  for (const xAOD::Jet* jet : *jets) {
    hist ("h_jetPt")->Fill (jet->pt() * 0.001); // GeV
  } // end for loop over jets

That is it on the c++ side. Now let’s compile your code.

Setting up the jobs

To now successfully run your updated code, you need to do slightly different things depending on whether you are using EventLoop or Athena. In the following we describe these differences.

Running the job (Event Loop)

cd ../run
ATestRun_eljob.py --submission-dir=submitDir

Outputs in EventLoop

EventLoop creates a file automatically for your output histograms, you don’t need to modify your job submission script at all to run your histogram writing algorithm.

Once your job is finished you can find the histogram(s) inside the unique directory you created at job submission (submitDir). There is a different file for each sample you submitted (hist-label.root), so in our case we have only one submitDir/hist-mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3126_r10201_p4172.root. Please note that the second part of the histogram file name will correspond directly to the name of the sample in SampleHandler, while the first part (hist-) is set by EventLoop and can not be changed.

Outputs in Athena

For Athena there is one more step that you need to take. You have to tell your job where your algorithm should write the histogram(s). You do that by adding the following into your jobOption file (see main athena tutorial for more info):

jps.AthenaCommonFlags.HistOutputs = ["ANALYSIS:MyxAODAnalysis.outputs.root"]
svcMgr.THistSvc.MaxFileSize=-1 #speeds up jobs that output lots of histograms

This is needed because every Athena algorithm creates histograms through the THistSvc, which can have any number of files/streams open at one time. With the above instruction you tell the service to (re-)create a file called MyxAODAnalysis.outputs.root, and assign it to the ANALYSIS stream. Which is the stream all analysis algorithms will write to by default.

Keep in mind though that different algorithms can be assigned to different files/streams. Though during analysis it is not recommended to create too many separate files… Still, if you want to tell your algorithm which stream it should write its histograms into, you can do it like:

alg.RootStreamName = 'MY_STREAM_01'

Then of course you will have to make sure that you set up MY_STREAM_01


⭐️ Bonus Exercise

Make a new histogram that stores the eta of each jet. Now do phi, mass, and energy.