Systematics (advanced)

Last update: 13 Oct 2018 [History] [Edit]

At some point in your analysis you will have to deal with systematics. There is no way around it; and be it just to verify that you are statistics limited and you don’t have to worry about systematics.

However, systematics is also quite literally an advanced topic, as in you have to have an analysis before you can evaluate its systematics. As such if this is the first time through the tutorial and you haven’t added any CP tools to your algorithm you should skip this section for now and move on to using some CP tools in your analysis. Once you have that you can come back here and learn how to evaluate your systematics.

The way systematics evaluation works is that you evaluate your analysis at different points in nuisance parameter space and give the results you get to your statistics program which then combines them into an overall result with an associated systematic error. How that happens will not be covered here, instead we’ll focus on how to prepare the intermediate results you need as inputs for the statistics program.

One classical example of a systematic is the calorimeter energy scale: We are pretty certain of what it is, but our assumed energy scale is almost guaranteed to be slightly off (as our calibration has limited accuracy). And if the energy scale is slightly off the energy of measured objects will be off as well, impacting our result. To characterize this we introduce a nuisance parameter, which represents the energy scale and which we can vary to see the effect which it has on the result. We also have an external constraint on that parameter, i.e. we calibrated the calorimeter and know how consistent a given nuisance parameter value is with that calibration.

For our purposes a nuisance parameter is an experimental parameter we don’t know for certain and which is not part of our result, but which affects our result. By convention it is scaled so that the external constraint can be represented by a unit Gaussian, i.e. 0 corresponds to the nominal value and +/-1 correspond to +/-1 sigma variations.

In our code we represent nuisance parameter values via SystematicVariation objects, which contain both the name of the nuisance parameter and the value. An actual point in nuisance parameter space is represented by a SystematicSet object, which can contain multiple SystematicVariation objects (though in most cases it is empty or contains only one of them).

To incorporate systematics into an analysis it is typically sufficient to reconfigure the CP tools for the different points in nuisance parameter space. To that end CP tools affected by systematics incorporate the ISystematicsTool interface. This is typically best done by collecting all your systematics tools into a single vector and looping over them. For the rest of this discussion we’ll assume you have such a vector defined in your algorithm class:

std::vector<CP::ISystematicsTool*> m_systematicsTools;

Getting the List of Systematic Variations

In general you will have to generate a list of all the nuisance parameter points you want to evaluate. Before doing that you need the list of all nuisance parameters that you want to evaluate. For that ISystematicsTool has two functions: recommendedSystematics() and affectingSystematics(). By and large you want to stick with the recommended systematics, which are the systematics the CP group actually recommends you to use. Affecting systematics are all the systematics that actually affect the tool, which can also include cross checks overly pessimistic systematic, etc.

To get the recommended systematics do:

CP::SystematicSet recommendedSystematics;
for (auto tool : m_systematicsTools)
  recommendedSystematics.insert (tool->recommendedSystematics());

Now you should convert this to a list of nuisance parameter points to evaluate. The simplest thing you can do (and which is sufficient for many analyses) is to do a +/-1 sigma variation for each systematic. There is a specific tool for just doing that:

std::vector<CP::SystematicSet> systematicsList
  = CP::make_systematics_vector(recommendedSystematics); 

Though for practical use you probably want to make this a member variable in your algorithm:

std::vector<CP::SystematicSet> m_systematicsList;

As to how you actually set that variable, you are in a bit of a bind:

  • If you are just using the make_systematics_vector on the recommended systematics you are probably fine just doing this inside initialize() of your algorithm.
  • Otherwise you should make your job write out the list of systematics after you configured all systematics tools, review the list of systematics you got that way, generate the list of nuisance parameter points on your local machine and add them to your job configuration.

warning Please note that you need to get the list of systematics after you configure your tools. Some tools have multiple possible systematics configurations which are selected using configuration parameters. So if you do this before configuring your tools you get the wrong answer.

Please note that this is really just the most basic way of generating your nuisance paramater points. For some more advanced features take a look at MakeSystematicsVector class. Two of the more popular options are:

  • Do +/- 5 sigma variations for small systematics. A small systematic in this context is a systematic for which the statistical noise in evaluating the systematic is larger than the systematic itself. By evaluating them at the 5 sigma point and scaling them down to 1 sigma the size of statistical fluctuations gets reduced accordingly.
  • Combine multiple systematics into a “toy” systematics. This can reduce the number of nuisance parameter points to evaluate, but it can not be done for systematics that you constrain/profile in your analysis.

Note that ideally you would pass the list of nuisance parameters into your statistics tool and it would give you the best list of nuisance parameters back. However, currently (09 Jul 17) none of our tools supports that.

Applying Systematic variations

Actually applying your systematics is very simple. Let’s assume that your execute function currently looks like this:

execute ()
{
  // do something
}

Then with systematics it will look like this:

execute ()
{
  for (auto& systematic : m_systematicsList)
  {
    for (auto tool : m_systematicsTools)
      tool->applySystematicVariation (systematic);

    // do something
  }
}

Or almost, you will have to make sure that you use unique names within each systematic, e.g. if you fill a histogram like this:

  hist("h_met")->Fill (met);

it may then look like this:

  hist("h_met_" + systematic.name())->Fill (met);

tip This is not the most efficient way of handling this, as it involves a fair amount of string operations. Ideally hist() would take the SystematicVariation object as a second argument to avoid that. Though, unless you have to fill a lot of histograms this is probably just fine.

You may have noticed that we just reconfigure and rerun all the tools for all the systematics, which is the simplest way to do this, as it ensures that we have a consistent view of the event, even if multiple CP tools are affected by the same systematic. It is also perfectly safe, as CP tools will just ignore any systematics that don’t affect them. However, it is also not the most efficient way of doing this, as we rerun all the tools even if they are not affected by the systematics. So there is some room for optimization (and some analysis frameworks do this), but there is some potential for mistakes and unless a lot of people will run your code it is probably not worth it.