WARNING: This section may no longer be up-to-date
One of the design parameters of EventLoop
is that it should equally
well support whatever way the user chooses to access the input data.
However, it is typically not possible to use multiple methods within the
same job, i.e. you have to pick one and stick with it for all
algorithms. A quick overview of methods:
If your favourite way of reading data is not on this list, shoot me a mail and I’ll try to work with you on putting it in. If you are undecided on which way to use, give a try to xAOD EDM, which is designed to give you the best possible performance in most situations without the need to fine tune it for your analysis.
This is the most basic way of reading in the data and will be familiar
to people who have used root before. You have to connect the branches to
your variables in changeInput
:
EL::StatusCode MyAlgorithm :: changeInput (bool firstFile)
{
TTree *tree = wk()->tree();
tree->SetBranchStatus ("*", 0);
tree->SetBranchStatus ("var1", 1);
tree->SetBranchAddress ("var1", &var1);
tree->SetBranchStatus ("var2", 1);
tree->SetBranchAddress ("var2", &var2);
// repeat for all variables you use
return EL::StatusCode::SUCCESS;
};
Then you have to read in the variables in execute
:
EL::StatusCode MyAlgorithm :: execute ()
{
wk()->tree()->GetEntry (wk()->treeEntry());
// actual event processing
};
The SetBranchStatus
statements are technically not necessary, but if
you use them and connect only to the branches you need you can often
gain a substantial amount of speed. If you don’t quite know how this
technique works, check out the official TTree documentation at
https://root.cern.ch/doc/master/classTTree.html.
Warning: If you use this technique, you can almost certainly not use more than one algorithm per job. The issue is that you tie your TTree object to the algorithm object. So using a second algorithm would attempt to tie your TTree to multiple algorithms, which almost certainly will not work.
When setting up your job object, you need to tell it that you are using an xAOD before you add your algorithms:
Job job;
job.useXAOD();
job.algsAdd (alg);
Then add a pointer to the xAOD::TEvent to your algorithm class:
/// description: the event we are reading from
private:
xAOD::TEvent *m_event; //!
And in your initialize
method, set that member:
EL::StatusCode MyAlgorithm :: initialize ()
{
m_event = wk()->xaodEvent();
// further initialization stuff
return EL::StatusCode::SUCCESS;
};
The xAOD classes have two mechanisms to read from a file, branch wise or class wise. In branch wise mode the first time you access a variable from the xAOD in an event, it reads the corresponding branch. For class wise mode it works the same, except that if you read one of the “core” variables from an object, it reads all of the “core” variables. This is primarily needed for making shallow copies work. The expectation is that xAODs made for analysis will not really have any “core” variables for objects, so it should not incur a performance penalty for a “typical” analysis workflow. The mode can be selected using one of the following:
job.options()->setString (EL::Job::optXaodAccessMode, EL::Job::optXaodAccessMode_branch);
job.options()->setString (EL::Job::optXaodAccessMode, EL::Job::optXaodAccessMode_class);
If neither is selected, EventLoop will select one for you. Currently that is class-mode, but this may change at some point in the future.
At the end of the job the xAOD classes send a report to a central server that contains information of how the xAODs were accessed, allowing us to optimize future versions of the xAOD. Generally this is desirable and should be causing no problems. However, sometimes it does, and in those cases you can turn off the reporting like this:
job.options()->setDouble (EL::Job::optXAODSummaryReport, 0);
The details of how to write an algorithm to use MultiDraw formulas is described on the MultiDraw TWiki, so the reader should check there. Here are only a couple of quick notes:
Many n-tuple files have in-file meta-data. The format of
this meta-data varies wildly, but one thing in common is that you have
to read them directly from the file and then combine them with the
event-data yourself. The most convenient way to access the input file,
is through the inputFile()
method:
TFile *file = wk()->inputFile();
There is some meta-data that you need to process for each individual input file, even those containing no events, e.g. the list of luminosity-blocks contained. Any such processing can be done in the fileExecute() function of your algorithm:
EL::StatusCode MyAlg :: fileExecute ()
{
// Here you do everything that needs to be done exactly once for every
// single file, e.g. collect a list of all lumi-blocks processed
return EL::StatusCode::SUCCESS;
}
You can access the trigger configuration tree inside EventLoop via:
TTree *trigConfTree = wk()->triggerConfig();
Or alternatively, you can use the more manual way:
TTree *trigConfTree = dynamic_cast<TTree*>(wk()->inputFile()->Get("physicsMeta/TrigConfTree"));
Creating output n-tuples is a little more complicated than just plain histograms, because of their (potential) size. The strategy taken by the EventLoop package is to store them directly on the storage element of your local batch system. Only if no such element exists are the n-tuples copied back to the submission node. This approach has the following advantages:
The easiest way to do that is to use the NTupleSvc
in EventLoopAlgs
for that. For that you need to make sure that you have the
EventLoopAlgs package checked out as described in the
introduction. You need to make sure your package depends on
EventLoopAlgs (update your CMakeLists.txt
).
You will also need the following includes inside your code:
#include <EventLoopAlgs/NTupleSvc.h>
#include <EventLoopAlgs/AlgSelect.h>
As a first step create an n-tuple service object and add it to your job. This has to happen before you add any algorithms that use that particular service:
EL::OutputStream output ("output");
job.outputAdd (output);
EL::NTupleSvc *ntuple = new EL::NTupleSvc ("output");
// configure ntuple object
job.algsAdd (ntuple);
The string outputLabel
is just an arbitrary name, that identifies this
particular output. If you create multiple outputs in the same job each
of them needs to be given a different label. In the future, some drivers
may allow you to specify further options as part of the OutputStream
object.
You can request certain variables to be copied over directly from the input tree. You can also specify branches using expressions. Do this before you add the service to the job:
ntuple->copyBranch ("RunNumber");
ntuple->copyBranch ("EventNumber");
To select the events you want to use, you can use a selection algorithm:
EL::AlgSelect *select = new EL::AlgSelect ("output");
select->addCut ("el_n>=0");
job.algsAdd (select);
If you specify multiple cuts you might also want to create a histogram containing the cut flow:
select->histName ("cut_flow");
You can also directly access and manipulate the NTupleSvc
from inside
your algorithm:
EL::NTupleSvc *ntuple = EL::getNTupleSvc (wk(), "output");
If you want you can add a new branch (you should do this in
initialize
):
ntuple->tree()->Branch ("myvar", &myvar, "myvar/F");
Or you can manually select the events you want to keep (you should do
this in execute
):
ntuple->setFilterPassed ();
A couple of notes:
TTree
or how
the TTree::Branch
statements work, please check out the TTree
documentation.If you want to store xAOD objects in the OutputStream, you should create it like this:
EL::OutputStream output ("output", "xAOD");
This will cause the files to be merged using xAODMerge on the grid. AnalysisBase-2.1.35 or later is needed to use this option. If you do not store any xAOD meta data, you can instead give the option “xAODNoMeta” which will use a faster merging option.
If you are using the DirectDriver or BatchDriver, you can also write your output n-tuples directly onto an xrootd server. For that to work, you just have to make a slight modification to how you declare the OutputStream:
EL::OutputStream output ("label");
output.output (new SH::DiskOutputXRD ("root://myserver/dir/"));
job.outputAdd (output);
If you don’t want to use the n-tuple service, you can also create the
n-tuple manually. For that the first thing you have to do to create an
output n-tuple is to configure your job to create one. This can be done
at the same time as creating the algorithms, but I recommend you do it
in the setupJob
method of your algorithm. That way it gets
automatically configured whenever the algorithm gets used. The actual
syntax for this is:
EL::StatusCode MyAlgorithm :: setupJob (EL::Job& job)
{
OutputStream out ("outputLabel");
job.outputAdd (out);
return EL::StatusCode::SUCCESS;
};
On each worker node EventLoop
will create an output file for you. You
can access it through the output label. Traditionally you would do that
in the initialize
member function, and create a new output TTree
there:
EL::StatusCode MyAlgorithm :: initialize ()
{
TFile *file = wk()->getOutputFile ("outputLabel");
tree = new TTree ("tree", "output tree");
tree->SetDirectory (file);
tree->Branch ("var1", &var1, "var1/F");
tree->Branch ("var2", &var2, "var2/I");
// further branch statements and other configuration
return EL::StatusCode::SUCCESS;
};
Then in the execute
function you have to fill the output variable and
call TTree::Fill
for the events you want to save:
void MyAlgorithm :: execute ()
{
// do other event processing stuff and fill output variables
tree->Fill ();
return EL::StatusCode::SUCCESS;
};
You should avoid having your algorithm create any kind of output without making it know to EventLoop, otherwise it might be lost e.g. when running on the grid. Always use the OutputStreams, see below for an example:
TPileupReweighting
To create the reweighting files, do the following:
In the setupJob method, create an output stream:
job.outputAdd(EL::OutputStream("outFile"));
In the finalize method:
my_PileupTool->WriteToFile(wk()->getOutputFile("outFile"));
Sometimes you feel the need to move the EventLoop submission directory, typically because you changed your mind on where things should be stored. To do that, you first need to wait until all your jobs from that submission finished, otherwise the results will be undefined and probably bad. Then you need to move the directory to where you want. Then you call updateLocation on the new location:
mv submitDir newDir
root -l -b -q "$ROOTCOREBIN/user_scripts/EventLoop/updateLocation.C ("newDir")"
Or you can also call it from inside root:
EL::Driver::updateLocation ("newDir")
In all likelihood you already have an existing analysis setup. In this section I will try to give you some advice on how to convert your existing code. Of course every situation is different, so you may or may not find that this advice works in your situation. However, you should feel free to contact me for further advice, or suggestions on how to improve this section.
Please note that you should either make a backup of your analysis code or work on a copy of the code. There are things that can get wrong in the conversion and you don’t want to be stuck with a wrecked analysis. Also, if you haven’t done so already, please convert your analysis for compilation in cmake.
This section is for you, if you started your analysis by calling
MakeClass
on your n-tuple. Unfortunately this kind of code is somewhat
more difficult to convert, since you don’t have your code organized as
an algorithm. However, in most cases it should still be quite feasible
to convert it into an algorithm without too much effort. For the rest of
this section I assume that your class is named MyClass
.
First perform a couple of fixes in the header file. Derive your class from the Algorithm class, i.e.:
#include <EventLoop/Algorithm.h>
class MyClass : public EL::Algorithm {
Also add a couple more entries to the class:
// these are the functions from Algorithm
virtual EL::StatusCode setupJob (EL::Job& job);
virtual EL::StatusCode changeInput (bool firstFile);
virtual EL::StatusCode initialize ();
virtual EL::StatusCode execute ();
virtual EL::StatusCode finalize ();
// this is needed to distribute the algorithm to the workers
ClassDef(MyClass, 1);
And comment out / remove the Loop
function, because we will have to
split it up:
//virtual void Loop();
If there are any std::vector variables, make sure you add a //!
in the
end to protect them from CINT (otherwise you will experience random
crashes), e.g.:
std::vector *el_pt; //!
In the constructor you have to comment out / remove everything that relates to opening a file, i.e. it should look something like this:
MyClass::MyClass(TTree *tree)
{
// if parameter tree is not specified (or zero), connect the file
// used to generate this class and read the Tree.
// if (tree == 0) {
// TFile *f = (TFile*)gROOT->GetListOfFiles()->FindObject("src-eventloop/EventLoop/data/test_ntuple.root");
// if (!f) {
// f = new TFile("src-eventloop/EventLoop/data/test_ntuple.root");
// }
// tree = (TTree*)gDirectory->Get("physics");
//
// }
// Init(tree);
}
And you have to fix the destructor to leave the input tree alone (comment out the delete statement):
MyClass::~MyClass()
{
if (!fChain) return;
//delete fChain->GetCurrentFile();
}
Now for the hard part: In the source file, you need to split up the
Loop
function into several functions. This will be the tricky part. If
you start out, your Loop
functions will look something like this:
void MyClass::Loop()
{
if (fChain == 0) return;
// code segment 1: your initialization code sits here
Long64_t nentries = fChain->GetEntriesFast();
Long64_t nbytes = 0, nb = 0;
for (Long64_t jentry=0; jentry<nentries;jentry++) {
Long64_t ientry = LoadTree(jentry);
if (ientry < 0) break;
nb = fChain->GetEntry(jentry); nbytes += nb;
// if (Cut(ientry) < 0) continue;
// code segment 2: your per-event code sits here
}
// code segment 3: your post-processing code sits here
}
First of all add
#include <EventLoop/StatusCode.h>
#include <EventLoop/Worker.h>
which is needed so you can override the Algorithm functions and access
the data on the worker node. And add a changeInput
function that takes
care of connecting to the tree whenever the file changes:
EL::StatusCode MyAlgorithm :: changeInput (bool firstFile)
{
Init (wk()->tree());
return EL::StatusCode::SUCCESS;
};
First let us take care of code segment 1. Suppose it looks like this:
TFile *outputFile = new TFile ("output.root", "RECREATE");
TH1 *hist = new TH1F ("hist", "hist", 10, 0, 1);
Any code for creating output files should just be removed, EventLoop
will take care of that for you. Any variables defined in this code
segment probably have to go into the class itself. Please note that any
variables you put into the header file will have to be protected with a
//!
. The only exception to this are variables that contain
configuration parameters. In this case that means put this statement
into your header file:
TH1 *hist; //!
Then the code itself has to be put into a newly created initialize
method. Any histograms you create have to be added to the output list as
well:
EL::StatusCode MyAlgorithm :: initialize ()
{
hist = new TH1F ("hist", "hist", 10, 0, 10);
wk()->addOutput (hist);
return EL::StatusCode::SUCCESS;
};
Please make sure that you don’t redefine any variables you have moved
to the header file, i.e. don’t write TH1 *hist
. Your code will
compile, but it won’t work and most likely crash. If you are creating
an output n-tuple, please look at the section on how to create an output
n-tuple.
Code segment 2 is essentially what will go into your execute
method.
We just get rid of the for loop altogether (handled by EventLoop), and
can use a simplified version of the GetEntry call:
EL::StatusCode MyAlgorithm :: execute ()
{
wk()->tree()->GetEntry (wk()->treeEntry());
// put code segment 2 right here
};
If you call GetEntry
on the branches instead of the tree, you can do
the same here.
Code segment 3 is somewhat tricky. It may contain some code that should
go into a finalize
function, but most likely you have to move it into
your steering macro. If you don’t create a finalize
function here,
either create an empty one, or remove it from the header file. Actually
even better than moving the code into your steering macro, move it into
a separate macro that reads the output file. That way you can change the
macro and re-run it without re-running the entire event loop. Either
way, when adapting this code, you have to read the histogram in from the
output file before you can use it. E.g. the code
hist->Draw ();
would change into
TFile *file = new TFile ("jobDir/hist-sample.root", "READ");
TH1 *hist = (TH1*) file->Get ("hist");
hist->Draw ();
};
Now let’s change the steering code to call EventLoop instead, e.g. let’s say it looks like this now:
TChain chain ("physics");
// initialize chain
MyClass t (&chain);
t.Loop();
Then it would change to:
TChain chain ("physics");
// initialize chain
EL::Job job;
SH::SampleHandler sh;
sh.add (SH::makeFromTChain ("sample", chain));
job.sampleHandler (sh);
job.algsAdd (new MyClass);
DirectDriver driver;
driver.submit (job);
That’s it. If you did everything right, you should now have some analysis code that runs (locally) and does what it did before. However, now you can swap out the driver and run on your local batch system if you want to do so. If you have the time, you might want to clean up your code a little more.
If you are reading in std::vector variables from a TTree you may find that your code now inexplicably crashes. One possible reason is that in your code the member variables you are reading into are not properly initialized. Double check your code and change the corresponding lines from:
std::vector<float> *jets_selected_pt; //!
to
std::vector<float> *jets_selected_pt = nullptr; //!
Or alternatively initialize them in the constructor, but personally I prefer doing so in the header file, as it is easier to check that you indeed initialized all members properly.