Histograms for Systematics

Last update: 22 Nov 2024 [History] [Edit]

This page is currently outdated

The common CP algorithms provide systematic variations of every analysis object. In a typical analysis workflow, it is necessary to save information about the events with each systematic variation. However, this can be a significant bookkeeping challenge and the most common approach of creating separate trees for each systematic variation is very space inefficient.

tip There is an optimal approach to handling systematic variations in ntuples that results in files that are 1-2 orders of magnitude smaller than the common approach. This makes use of the TreeMaker, AsgxAODNTupleMaker and TreeFiller algorithms. However, this approach is beyond the scope of this tutorial. An example implementation can be found here.

Instead of dealing with ntuple bookkeeping, this tutorial will show you how to produce systematic variations of histograms. In this section we’re just going to look at some kinematics before systematics are applied; we’ll apply the systematics and look at the variations in the next section.

Plot Sum Of Muon PT

As mentioned earlier, the CP algorithms put their outputs in the event store, and those outputs can then be accessed as if they come directly from the input file. To begin with, we won’t deal with systematics just yet.

Let’s make a histogram of the sum pT of all muons in an event. First, we will create the histogram itself. This will be done in a slightly different way than was introduced in Making Histograms. This is because we will modify it in the next section when creating systematic variations.

First declare it in your class definition inside the header file (MyxAODAnalysis.h), in the private: section:

TH1 *m_sumPtHist {nullptr};

Next we will create the histogram. This is done in execute(), not initialize() so that we can pick up the systematics in the next section (systematics are not available during initialize()). So add these lines at the beginning of execute():

  // Check if the histogram is undefined — if so, define it!
  if (m_sumPtHist == nullptr)
  {
    // Give the histogram a sensible name
    std::string name = "sumPtHist";
    // Book the histogram (register it with the framework)
    ANA_CHECK (book (TH1F (name.c_str(), "pt", 20, 0, 200e3)));
    // Keep a member pointer to the histogram for quick access
    m_sumPtHist = hist (name);
  } // End of check if the histogram was undefined

Now let’s retrieve the list of muons, calculate the sum of pT, and fill it into the histogram:

  // Initialize a pointer to a container of muons - empty; the event store will fill it
  const xAOD::MuonContainer *systMuons = nullptr;
  // Get the muons we are interested in from the event store. The "no systematics" version.
  ANA_CHECK (evtStore()->retrieve (systMuons, "AnaMuons_NOSYS"));
  // Holder for the sum of the pT of the muons
  float sumPt = 0;
  // Loop over the muons - since it's a container, we can use an easy loop syntax
  // All we do in the loop is add the muon's pT to the sum of pT
  for (const xAOD::Muon *mu : *systMuons)
    sumPt += mu->pt();
  // Fill the histogram with the result from this event
  m_sumPtHist->Fill (sumPt);

Remember that we have to teach the compiler what muons are. We do that by adding a #include statement at the top of the source file:

#include <xAODMuon/xAODMuonContainer.h>

tip You’ll see ATLAS code use both #include "Something.h" and #include <Something.h>. In most cases in our setups they are identical. Formally, The version with quotes checks the current working directory before looking in the normal include paths, while the version with angle brackets looks in the include paths first. For system includes (like iostream or cmath, we prefer angle brackets. If you didn’t understand that, then you probably don’t need to worry about the differences. When in doubt, follow the style of the code that you’re working on.

We also need to change the key LINK_LIBRARIES line in our CMakeLists.txt file:

  LINK_LIBRARIES AnaAlgorithmLib xAODEventInfo xAODMuon)

Now compile and run it and see if you get the newly defined histogram and if it makes sense.

tip Did you forget some of the commands from earlier in the tutorial, like how to setup or run? You can type history on the command line and see what you’ve run in the past.

tip Did you get an error like Branch "AnaMuons" not available on input?

Yes, help me with my error! Order is very important when we run a job. It's very likely what happened is that your analysis code is running before the AnaMuons container has been made. In your python configuration, make sure that the line:
algSeq.addSelfToJob( job )
Comes before the lines:
# Add our algorithm to the job
job.algsAdd( alg )

Some things to note here:

  • The name of the muon container is based on what you specified when you configured your muon analysis sequence. You can change this if you want, as long as you do it everywhere.

  • There is no specific ordering to the muons in the container, or more specifically they have the same ordering as the input container, i.e., they are ordered in pT before momentum corrections are applied. It’s possible to change this with a little work, but it’s better to get used to the containers not being ordered (as that allows certain optimizations).

  • We are using const containers and objects, as we do not intend to modify them at all. This will become more important in the next section.

⭐️ Bonus Exercise

Try to create separate histograms containing only the pT of the highest momentum muon and second highest momentum muon (if there is one). We often call these the “leading” and “sub-leading” muons. See if you can overlay the two histograms in ROOT so that you can easily compare them.

⭐️ Bonus Exercise 2

We didn’t use any weights when filling the histogram. That’s the right thing to do if we are running on data, but for MC simulation you will almost always want some weights. For this exercise, we’ll start with the muon scale factors, which have to do with how well modeled the muon reconstruction and isolation are (for example). First, we need to include those scale factors in your job. Remove this line from the Muons block of your config.yaml file:

        noEffSF: True

Now you can look in the output ROOT file submitDir/data-ANALYSIS/dataset.root and see all of the scale factors that are available. You should see mu_isol_effSF_medium_NOSYS (the isolation scale factor), mu_reco_effSF_medium_NOSYS (the reconstruction scale factor), and mu_TTVA_effSF_medium_NOSYS (the track-to-vertex association scale factor).

In your source code, we now have to access those scale factors and apply them in our histograms. Since the scale factors are per muon, you’ll need to multiply all the important ones together to get the total scale factor for our sumPtHist histogram. Of course, you can use just the scale factor for the leading muon in your histogram of the pT of the leading muon.

To get those scale factors, we need to use “accessors”. You might remember learning about “dynamic” variables — variables that can be attached to objects but are not a part of the class. That means in order to access those variables, we can’t just use a regular-looking class function like muon->isolation_scale_factor(). Instead, we need to use accessors (there are also “decorators” that add those dynamic variables, which we call “decorations”).

For the accessors, we will need to know the names of the decorations. Here our logfile can help us. Look at the printout of your job, and scroll up to the dump of the configuration. Look for the AsgxAODNTupleMakerAlg/NTupleMakeranalysis section, and you will see a mapping from variable to branch name. See if you can understand how that mapping works for a few variables. The ones we’re interested in look like: OutMuons_%SYS%.muon_reco_effSF_medium_%SYS% -> mu_reco_effSF_medium_%SYS%.

Add in the execute method of your source code accessors for each of the scale factors:

  static const SG::AuxElement::ConstAccessor<float> isol_sf_acc("muon_isol_effSF_medium_NOSYS");
  static const SG::AuxElement::ConstAccessor<float> reco_sf_acc("muon_reco_effSF_medium_NOSYS");;
  static const SG::AuxElement::ConstAccessor<float> ttva_sf_acc("muon_TTVA_effSF_medium_NOSYS");;

tip Accessors are almost always const and static to save time and memory during code execution.

tip For these accessors, we didn’t need any new #include statements. That’s because the accessors are already included by other headers that we include.

Now let’s change our histogram creation to calculate the total scale factor for the event and to apply it when filling the histogram:

  float sumPt = 0;
  float total_sf = 1.;
  for (const xAOD::Muon *mu : *systMuons)
  {
    float isol_sf = isol_sf_acc( *mu );
    if (isol_sf>0.) total_sf *= isol_sf;
    float reco_sf = reco_sf_acc( *mu );
    if (reco_sf>0.) total_sf *= reco_sf;
    float ttva_sf = ttva_sf_acc( *mu );
    if (ttva_sf>0.) total_sf *= ttva_sf;
    sumPt += mu->pt();
  }
  m_sumPtHist->Fill (sumPt , total_sf );

tip Here we are just checking if the scale factors are greater than zero. You will find that different groups use different standards when a tool should not be applied (for example when a muon is out of the momentum range where scale factors have been provided). Some use scale factors of -1 to indicate that the scale factor should not be used. In a real analysis, we would be checking momentum and rapidity for all these muons and making sure we had valid scale factors; for now we will just skip the “bad” ones.

Now recompile your code, run, and see if you can spot any differences in the histogram. If you have time, there are a few other things you can try:

  • Make a histogram of the total scale factor for the event. How close to 1 is it normally? Can you figure out which scale factor is furthest from 1 on average?
  • Make histograms with and without the scale factors applied in your code. How big are the differences?

⭐️ Bonus Exercise 3

Oh no, I can’t remember the names of my containers! Don’t worry, it happens to all of us. The event store knows what containers it holds, and you can ask it to give you some hints.

Try adding to your execute() function something like:

  std::vector<std::string> vkeys;
  evtStore()->keys<xAOD::MuonContainer>(vkeys);
  ANA_MSG_INFO( "Keys for xAOD::MuonContainers in the event store:" );
  for (const std::string & akey:vkeys){
    ANA_MSG_INFO( "   " << akey );
  }
  ANA_MSG_INFO( "All done!" );

This code asks the event store for all of the xAOD::MuonContainer keys that it knows about. Right now, you should only see three:

AnalysisAlg              INFO    Keys for xAOD::MuonContainers in the event store:
AnalysisAlg              INFO       AnaMuons_NOSYS
AnalysisAlg              INFO       AnaMuons_STEP1_NOSYS
AnalysisAlg              INFO       OutMuons_NOSYS
AnalysisAlg              INFO    All done!

If you like, you can try changing the collection type to look at keys for electrons, or the EventInfo, or any other object you’d like. Don’t forget to add the appropriate header file includes and adjust the CMakeLists.txt file if you decide to try with a different object type!