EventLoop Legacy documentation

Last update: 06 Nov 2019 [History] [Edit]

warning warning WARNING: This section may no longer be up-to-date warning warning

Accessing the input data

One of the design parameters of EventLoop is that it should equally well support whatever way the user chooses to access the input data. However, it is typically not possible to use multiple methods within the same job, i.e. you have to pick one and stick with it for all algorithms. A quick overview of methods:

  • basic access using TTree and entry number: most basic way of reading data. doesn’t require extra software. only allows one algorithm per job. often slower and/or more fragile than other methods.
  • xAOD EDM: official ASG solution for reading data in run 2
  • MultiDraw formulas: allows access through TTree::Draw like formulas. can be used together with any other access method.

If your favorite way of reading data is not on this list, shoot me a mail and I’ll try to work with you on putting it in. If you are undecided on which way to use, give a try to xAOD EDM, which is designed to give you the best possible performance in most situations without the need to fine tune it for your analysis.

Basic access using TTree and entry number

This is the most basic way of reading in the data and will be familiar to people who have used root before. You have to connect the branches to your variables in changeInput:

    EL::StatusCode MyAlgorithm :: changeInput (bool firstFile)
    {
      TTree *tree = wk()->tree();
      tree->SetBranchStatus ("*", 0);
      tree->SetBranchStatus ("var1", 1);
      tree->SetBranchAddress ("var1", &var1);
      tree->SetBranchStatus ("var2", 1);
      tree->SetBranchAddress ("var2", &var2);
      // repeat for all variables you use
      return EL::StatusCode::SUCCESS;
    };

Then you have to read in the variables in execute:

    EL::StatusCode MyAlgorithm :: execute ()
    {
      wk()->tree()->GetEntry (wk()->treeEntry());
      // actual event processing
    };

The SetBranchStatus statements are technically not necessary, but if you use them and connect only to the branches you need you can often gain a substantial amount of speed. If you don’t quite know how this technique works, check out the official TTree documentation at http://root.cern.ch/root/html/TTree.

Warning: If you use this technique, you can almost certainly not use more than one algorithm per job. The issue is that you tie your TTree object to the algorithm object. So using a second algorithm would attempt to tie your TTree to multiple algorithms, which almost certainly will not work.

Access the Data Through xAOD EDM

When setting up your job object, you need to tell it that you are using an xAOD before you add your algorithms:

      Job job;
      job.useXAOD();
      job.algsAdd (alg);

Then add a pointer to the xAOD::TEvent to your algorithm class:

        /// description: the event we are reading from
      private:
        xAOD::TEvent *m_event; //!

And in your initialize method, set that member:

    EL::StatusCode MyAlgorithm :: initialize ()
    {
      m_event = wk()->xaodEvent();
      // further initialization stuff
      return EL::StatusCode::SUCCESS;
    };

The xAOD classes have two mechanisms to read from a file, branch wise or class wise. In branch wise mode the first time you access a variable from the xAOD in an event, it reads the corresponding branch. For class wise mode it works the same, except that if you read one of the “core” variables from an object, it reads all of the “core” variables. This is primarily needed for making shallow copies work. The expectation is that xAODs made for analysis will not really have any “core” variables for objects, so it should not incur a performance penalty for a “typical” analysis workflow. The mode can be selected using one of the following:

job.options()->setString (EL::Job::optXaodAccessMode, EL::Job::optXaodAccessMode_branch);
job.options()->setString (EL::Job::optXaodAccessMode, EL::Job::optXaodAccessMode_class);

If neither is selected, EventLoop will select one for you. Currently that is class-mode, but this may change at some point in the future.

Turn off the xAOD Summary Access Report

At the end of the job the xAOD classes send a report to a central server that contains information of how the xAODs were accessed, allowing us to optimize future versions of the xAOD. Generally this is desirable and should be causing no problems. However, sometimes it does, and in those cases you can turn off the reporting like this:

job.options()->setDouble (EL::Job::optXAODSummaryReport, 0);

Access the Data Through MultiDraw

The details of how to write an algorithm to use MultiDraw formulas is described on the MultiDraw TWiki, so the reader should check there. Here are only a couple of quick notes:

  • Due to its implementation MultiDraw formulas can be combined with any other way of accessing the data. This means that you can keep reading the data in the usual way and just use the MultiDraw formulas when you want to pass in an expression through configuration instead of hardcoding it into your source code.
  • While it is called a formula services, you can also use this service for passing in simple variable names, e.g. the name of the variable containing your MC weight. This is typically a lot simpler than trying to parse text fields in C++ yourself.
  • While it is not required, it is recommended that you check whether the formula evaluated properly.
  • Reading data through MultiDraw is inherently slower than reading the data directly and then writing a C++ expression. The advantage of MultiDraw is run-time flexibility, but it comes at a price in speed.

Accessing Meta-Data

Many n-tuple files have in-file meta-data. The format of this meta-data varies wildly, but one thing in common is that you have to read them directly from the file and then combine them with the event-data yourself. The most convenient way to access the input file, is through the inputFile() method:

TFile *file = wk()->inputFile();

Processing Per-File Meta-Data

There is some meta-data that you need to process for each individual input file, even those containing no events, e.g. the list of luminosity-blocks contained. Any such processing can be done in the fileExecute() function of your algorithm:

EL::StatusCode MyAlg :: fileExecute ()
{
  // Here you do everything that needs to be done exactly once for every
  // single file, e.g. collect a list of all lumi-blocks processed
  return EL::StatusCode::SUCCESS;
}

Accessing the Trigger Configuration Tree

You can access the trigger configuration tree inside EventLoop via:

TTree *trigConfTree = wk()->triggerConfig();

Or alternatively, you can use the more manual way:

TTree *trigConfTree = dynamic_cast<TTree*>(wk()->inputFile()->Get("physicsMeta/TrigConfTree"));

Creating output n-tuples

Creating output n-tuples is a little more complicated than just plain histograms, because of their (potential) size. The strategy taken by the EventLoop package is to store them directly on the storage element of your local batch system. Only if no such element exists are the n-tuples copied back to the submission node. This approach has the following advantages:

  • Since the output are n-tuples you probably want to run over them in another event loop, probably using the same batch system. In most cases the submission node is not designed to serve data to dozens if not hundreds of worker node processes.
  • Copying potentially large files over the network back to the submission node is a potentially slow process, particularly if you are running on the grid.
  • The submission node will often not have enough disk space to store large quantities of n-tuples, which would make it very awkward to run over larger datasets.

The easiest way to do that is to use the NTupleSvc in EventLoopAlgs for that. For that you need to make sure that you have the EventLoopAlgs package checked out as described in the introduction. You need to make sure your package depends on EventLoopAlgs (update your CMakeLists.txt).

You will also need the following includes inside your code:

#include <EventLoopAlgs/NTupleSvc.h>
#include <EventLoopAlgs/AlgSelect.h>

As a first step create an n-tuple service object and add it to your job. This has to happen before you add any algorithms that use that particular service:

EL::OutputStream output ("output");
job.outputAdd (output);
EL::NTupleSvc *ntuple = new EL::NTupleSvc ("output");
// configure ntuple object
job.algsAdd (ntuple);

The string outputLabel is just an arbitrary name, that identifies this particular output. If you create multiple outputs in the same job each of them needs to be given a different label. In the future, some drivers may allow you to specify further options as part of the OutputStream object.

You can request certain variables to be copied over directly from the input tree. You can also specify branches using expressions. Do this before you add the service to the job:

ntuple->copyBranch ("RunNumber");
ntuple->copyBranch ("EventNumber");

To select the events you want to use, you can use a selection algorithm:

EL::AlgSelect *select = new EL::AlgSelect ("output");
select->addCut ("el_n>=0");
job.algsAdd (select);

If you specify multiple cuts you might also want to create a histogram containing the cut flow:

select->histName ("cut_flow");

You can also directly access and manipulate the NTupleSvc from inside your algorithm:

EL::NTupleSvc *ntuple = EL::getNTupleSvc (wk(), "output");

If you want you can add a new branch (you should do this in initialize):

ntuple->tree()->Branch ("myvar", &myvar, "myvar/F");

Or you can manually select the events you want to keep (you should do this in execute):

ntuple->setFilterPassed ();

A couple of notes:

  • If you are not quite familiar with how to create a TTree or how the TTree::Branch statements work, please check out the TTree documentation.
  • While in most cases you will store a single TTree in each file, you can store whatever you want. You can store multiple trees, or other kinds of objects.
  • If you want, you can always copy your files back to the submission node (or another machine) manually. If you don’t know how, contact me and I’ll try to figure it out with you.
  • There is discussion of offering a merge option for n-tuple files. This is meant to allow you to replace a large number of small n-tuples with a small number of large n-tuples, which are faster to process. If you are interested in this option, contact me and I’ll see if can get it implemented for you.

xAOD outputs

If you want to store xAOD objects in the OutputStream, you should create it like this:

EL::OutputStream output ("output", "xAOD");

This will cause the files to be merged using xAODMerge on the grid. AnalysisBase-2.1.35 or later is needed to use this option. If you do not store any xAOD meta data, you can instead give the option “xAODNoMeta” which will use a faster merging option.

Writing N-Tuples Directly to XRootD

If you are using the DirectDriver or BatchDriver, you can also write your output n-tuples directly onto an xrootd server. For that to work, you just have to make a slight modification to how you declare the OutputStream:

EL::OutputStream output ("label");
output.output (new SH::DiskOutputXRD ("root://myserver/dir/"));
job.outputAdd (output);

Manually Creating an N-tuple

If you don’t want to use the n-tuple service, you can also create the n-tuple manually. For that the first thing you have to do to create an output n-tuple is to configure your job to create one. This can be done at the same time as creating the algorithms, but I recommend you do it in the setupJob method of your algorithm. That way it gets automatically configured whenever the algorithm gets used. The actual syntax for this is:

EL::StatusCode MyAlgorithm :: setupJob (EL::Job& job)
{
  OutputStream out ("outputLabel");
  job.outputAdd (out);
  return EL::StatusCode::SUCCESS;
};

On each worker node EventLoop will create an output file for you. You can access it through the output label. Traditionally you would do that in the initialize member function, and create a new output TTree there:

EL::StatusCode MyAlgorithm :: initialize ()
{
  TFile *file = wk()->getOutputFile ("outputLabel");
  tree = new TTree ("tree", "output tree");
  tree->SetDirectory (file);
  tree->Branch ("var1", &var1, "var1/F");
  tree->Branch ("var2", &var2, "var2/I");
  // further branch statements and other configuration
  return EL::StatusCode::SUCCESS;
};

Then in the execute function you have to fill the output variable and call TTree::Fill for the events you want to save:

void MyAlgorithm :: execute ()
{
  // do other event processing stuff and fill output variables
  tree->Fill ();
  return EL::StatusCode::SUCCESS;
};

Other outputs

You should avoid having your algorithm create any kind of output without making it know to EventLoop, otherwise it might be lost e.g. when running on the grid. Always use the OutputStreams, see below for an example:

Solution for TPileupReweighting

To create the reweighting files, do the following:

In the setupJob method, create an output stream:

job.outputAdd(EL::OutputStream("outFile"));

In the finalize method:

my_PileupTool->WriteToFile(wk()->getOutputFile("outFile"));

Moving the Submission Directory

Sometimes you feel the need to move the EventLoop submission directory, typically because you changed your mind on where things should be stored. To do that, you first need to wait until all your jobs from that submission finished, otherwise the results will be undefined and probably bad. Then you need to move the directory to where you want. Then you call updateLocation on the new location:

      mv submitDir newDir
      root -l -b -q "$ROOTCOREBIN/user_scripts/EventLoop/updateLocation.C ("newDir")"

Or you can also call it from inside root:

      EL::Driver::updateLocation ("newDir")

Converting existing code to EventLoop

In all likelihood you already have an existing analysis setup. In this section I will try to give you some advice on how to convert your existing code. Of course every situation is different, so you may or may not find that this advice works in your situation. However, you should feel free to contact me for further advice, or suggestions on how to improve this section.

Please note that you should either make a backup of your analysis code or work on a copy of the code. There are things that can get wrong in the conversion and you don’t want to be stuck with a wrecked analysis. Also, if you haven’t done so already, please convert your analysis for compilation in cmake.

Converting MakeClass based code

This section is for you, if you started your analysis by calling MakeClass on your n-tuple. Unfortunately this kind of code is somewhat more difficult to convert, since you don’t have your code organized as an algorithm. However, in most cases it should still be quite feasible to convert it into an algorithm without too much effort. For the rest of this section I assume that your class is named MyClass.

First perform a couple of fixes in the header file. Derive your class from the Algorithm class, i.e.:

#include &lt;EventLoop/Algorithm.h&gt;
class MyClass : public EL::Algorithm {

Also add a couple more entries to the class:

  // these are the functions from Algorithm
  virtual EL::StatusCode setupJob (EL::Job& job);
  virtual EL::StatusCode changeInput (bool firstFile);
  virtual EL::StatusCode initialize ();
  virtual EL::StatusCode execute ();
  virtual EL::StatusCode finalize ();

  // this is needed to distribute the algorithm to the workers
  ClassDef(MyClass, 1);

And comment out / remove the Loop function, because we will have to split it up:

   //virtual void     Loop();

If there are any std::vector variables, make sure you add a //! in the end to protect them from CINT (otherwise you will experience random crashes), e.g.:

  std::vector *el_pt; //!

In the constructor you have to comment out / remove everything that relates to opening a file, i.e. it should look something like this:

MyClass::MyClass(TTree *tree)
{
// if parameter tree is not specified (or zero), connect the file
// used to generate this class and read the Tree.
//   if (tree == 0) {
//      TFile *f = (TFile*)gROOT->GetListOfFiles()->FindObject("src-eventloop/EventLoop/data/test_ntuple.root");
//      if (!f) {
//         f = new TFile("src-eventloop/EventLoop/data/test_ntuple.root");
//      }
//      tree = (TTree*)gDirectory->Get("physics");
//
//   }
//   Init(tree);
}

And you have to fix the destructor to leave the input tree alone (comment out the delete statement):

MyClass::~MyClass()
{
   if (!fChain) return;
   //delete fChain->GetCurrentFile();
}

Now for the hard part: In the source file, you need to split up the Loop function into several functions. This will be the tricky part. If you start out, your Loop functions will look something like this:

void MyClass::Loop()
{
   if (fChain == 0) return;

   // code segment 1: your initialization code sits here

   Long64_t nentries = fChain->GetEntriesFast();

   Long64_t nbytes = 0, nb = 0;
   for (Long64_t jentry=0; jentry<nentries;jentry++) {
      Long64_t ientry = LoadTree(jentry);
      if (ientry < 0) break;
      nb = fChain->GetEntry(jentry);   nbytes += nb;
      // if (Cut(ientry) < 0) continue;

      // code segment 2: your per-event code sits here
   }

   // code segment 3: your post-processing code sits here
}

First of all add

#include &lt;EventLoop/StatusCode.h&gt;
#include &lt;EventLoop/Worker.h&gt;

which is needed so you can override the Algorithm functions and access the data on the worker node. And add a changeInput function that takes care of connecting to the tree whenever the file changes:

EL::StatusCode MyAlgorithm :: changeInput (bool firstFile)
{
  Init (wk()->tree());
  return EL::StatusCode::SUCCESS;
};

First let us take care of code segment 1. Suppose it looks like this:

  TFile *outputFile = new TFile ("output.root", "RECREATE");
  TH1 *hist = new TH1F ("hist", "hist", 10, 0, 1);

Any code for creating output files should just be removed, EventLoop will take care of that for you. Any variables defined in this code segment probably have to go into the class itself. Please note that any variables you put into the header file will have to be protected with a //!. The only exception to this are variables that contain configuration parameters. In this case that means put this statement into your header file:

  TH1 *hist; //!

Then the code itself has to be put into a newly created initialize method. Any histograms you create have to be added to the output list as well:

EL::StatusCode MyAlgorithm :: initialize ()
{
  hist = new TH1F ("hist", "hist", 10, 0, 10);
  wk()->addOutput (hist);
  return EL::StatusCode::SUCCESS;
};

Please make sure that you don’t redefine any variables you have moved to the header file, i.e. don’t write TH1 *hist. Your code will compile, but it won’t work and most likely crash. If you are creating an output n-tuple, please look at the section on how to create an output n-tuple.

Code segment 2 is essentially what will go into your execute method. We just get rid of the for loop altogether (handled by EventLoop), and can use a simplified version of the GetEntry call:

EL::StatusCode MyAlgorithm :: execute ()
{
  wk()->tree()->GetEntry (wk()->treeEntry());

  // put code segment 2 right here
};

If you call GetEntry on the branches instead of the tree, you can do the same here.

Code segment 3 is somewhat tricky. It may contain some code that should go into a finalize function, but most likely you have to move it into your steering macro. If you don’t create a finalize function here, either create an empty one, or remove it from the header file. Actually even better than moving the code into your steering macro, move it into a separate macro that reads the output file. That way you can change the macro and re-run it without re-running the entire event loop. Either way, when adapting this code, you have to read the histogram in from the output file before you can use it. E.g. the code

  hist->Draw ();

would change into

  TFile *file = new TFile ("jobDir/hist-sample.root", "READ");
  TH1 *hist = (TH1*) file->Get ("hist");
  hist->Draw ();
};

Now let’s change the steering code to call EventLoop instead, e.g. let’s say it looks like this now:

  TChain chain ("physics");
  // initialize chain

  MyClass t (&chain);
  t.Loop();

Then it would change to:

  TChain chain ("physics");
  // initialize chain

  EL::Job job;
  SH::SampleHandler sh;
  sh.add (SH::makeFromTChain ("sample", chain));
  job.sampleHandler (sh);
  job.algsAdd (new MyClass);
  DirectDriver driver;
  driver.submit (job);

That’s it. If you did everything right, you should now have some analysis code that runs (locally) and does what it did before. However, now you can swap out the driver and run on your local batch system if you want to do so. If you have the time, you might want to clean up your code a little more.

Special Precautions for std::vector Variables (and Other Object Types)

If you are reading in std::vector variables from a TTree you may find that your code now inexplicably crashes. One possible reason is that in your code the member variables you are reading into are not properly initialized. Double check your code and change the corresponding lines from:

std::vector<float> *jets_selected_pt; //!

to

std::vector<float> *jets_selected_pt = nullptr; //!

Or alternatively initialize them in the constructor, but personally I prefer doing so in the header file, as it is easier to check that you indeed initialized all members properly.