Creating and running our steering macro

Last update: 09 Jul 2020 [History] [Edit]

To actually run this EventLoop algorithm we need some steering code. This can be a root macro in either C++ or python or some compiled C++ code. For this tutorial we will focus on writing a python macro, as that will be required to include the common CP algorithms, but the other options are equally valid (if you don’t intend to use the common CP algorithms).

tip This is only needed when working in EventLoop, inside Athena this will not be used. As such if you know that you will never work in EventLoop you can leave this out (or add it later). However, if you expect never to run in EventLoop you may be better off using the native athena algorithms instead of the AnaAlgorithm class.

We will use another ASG tool called SampleHandler which is a nice tool that allows for easy sample management. In this example we will create and configure a SampleHandler object. We will specify the path to the main directory, under which there could be several subdirectories (typically representing datasets) and within those the individual input files. Here we will tell SampleHandler we are only interested in one input xAOD file (specified by the exact name, but wildcards are accepted to using several specific inputs). More information and options for using SampleHandler to ‘find’ your data is found on the dedicated SampleHandler wiki.

You can really put your steering code anywhere, but it is probably a good idea to keep it in your source area (which is in version control), probably even in your package (typically in the share directory). However, for simplicity within this tutorial we will just place it directly into the source directory.

Writing a Python macro

Create a file called source/MyAnalysis/share/ATestRun_eljob.py, and make it executable (chmod +x source/MyAnalysis/share/ATestRun_eljob.py). Fill it with the following:

#!/usr/bin/env python

# Read the submission directory as a command line argument. You can
# extend the list of arguments with your private ones later on.
import optparse
parser = optparse.OptionParser()
parser.add_option( '-s', '--submission-dir', dest = 'submission_dir',
                   action = 'store', type = 'string', default = 'submitDir',
                   help = 'Submission directory for EventLoop' )
( options, args ) = parser.parse_args()

# Set up (Py)ROOT.
import ROOT
ROOT.xAOD.Init().ignore()

# Set up the sample handler object. See comments from the C++ macro
# for the details about these lines.
import os
sh = ROOT.SH.SampleHandler()
sh.setMetaString( 'nc_tree', 'CollectionTree' )
inputFilePath = os.getenv( 'ALRB_TutorialData' ) + '/r9315/'
ROOT.SH.ScanDir().filePattern( 'AOD.11182705._000001.pool.root.1' ).scan( sh, inputFilePath )
sh.printContent()

# Create an EventLoop job.
job = ROOT.EL.Job()
job.sampleHandler( sh )
job.options().setDouble( ROOT.EL.Job.optMaxEvents, 500 )
job.options().setString( ROOT.EL.Job.optSubmitDirMode, 'unique-link')

# Create the algorithm's configuration.
from AnaAlgorithm.DualUseConfig import createAlgorithm
alg = createAlgorithm ( 'MyxAODAnalysis', 'AnalysisAlg' )

# later on we'll add some configuration options for our algorithm that go here

# Add our algorithm to the job
job.algsAdd( alg )

# Run the job using the direct driver.
driver = ROOT.EL.DirectDriver()
driver.submit( job, options.submission_dir )

Read over the comments carefully to understand what is happening. Notice that we will only run over the first 500 events (for testing purposes). Obviously if you were doing a real analysis you would want to remove that statement to run over all events in a sample.

tip The way of creating an algorithm we are showing you above is the dual-use way, i.e. it is the same in EventLoop and Athena. An alternative EventLoop-only way of creating the algorithm is to use:

from AnaAlgorithm.AnaAlgorithmConfig import AnaAlgorithmConfig
alg = AnaAlgorithmConfig( 'MyxAODAnalysis/AnalysisAlg' )

To make sure that the newly used file gets installed and can be found we need to recompile (we need to call cmake explicitly as we did create a new file):

cd ../build/
cmake ../source/
make

To execute the job using this script, go to your run directory, and simply execute:

ATestRun_eljob.py --submission-dir=submitDir

tip

If your algorithm does not run, make sure that you have defined the environment variable ALRB_TutorialData, as explained here.

tip

Note that submitDir is the directory/location where the output of your job is stored. We set the mode for the directory to “unique-link”, which means that EventLoop will attach the date and time to that name to make it unique, and then create a link that points to the latest directory created. That way it is guaranteed that outputs from your job don’t get overwritten when you re-run your job, while at the same time making it easy for you to find the latest result.

For test runs this is generally a good setup, for actual production runs you may want to put some more thought in how you organize your output directories. If you want to avoid appending a unique suffix to your directory name you can switch “unique-link” above with “no-clobber”, which will just take submitDir as the actual directory name, and fail if the directory already exists.

tip While you are in principle free where you put your submitDir, avoid putting them into the source directory, as that is usually version controlled and you risk your data files being added to the repository (which is bad). Also avoid putting them into the build directory, as you often want to keep the contents of submitDir around, while the build directory should only contain files you don’t mind losing. Putting it inside the run directory is a reasonable choice if you have enough space there, but if it ends up containing large files you may need to put it onto a separate data disk. If you run in batch you may also need to put it inside a directory that is accessible from the worker nodes.

Writing a C++ macro (optional)

Please note, it is strongly recommended the python steering script be used for the configuration of your algorithm. It is not currently guaranteed the following instructions will work with more recent releases of AnalysisBase and AthAnalysis.

Create a new file called share/ATestRun_eljob.cxx, containing the following:

void ATestRun_eljob (const std::string& submitDir)
{
  // Set up the job for xAOD access:
  xAOD::Init().ignore();

  // create a new sample handler to describe the data files we use
  SH::SampleHandler sh;

  // scan for datasets in the given directory
  // this works if you are on lxplus, otherwise you'd want to copy over files
  // to your local machine and use a local path.  if you do so, make sure
  // that you copy all subdirectories and point this to the directory
  // containing all the files, not the subdirectories.

  // use SampleHandler to scan all of the subdirectories of a directory for particular MC single file:
  const char* inputFilePath = gSystem->ExpandPathName ("$ALRB_TutorialData/r9315/");
  SH::ScanDir().filePattern("AOD.11182705._000001.pool.root.1").scan(sh,inputFilePath);


  // set the name of the tree in our files
  // in the xAOD the TTree containing the EDM containers is "CollectionTree"
  sh.setMetaString ("nc_tree", "CollectionTree");

  // further sample handler configuration may go here

  // print out the samples we found
  sh.print ();

  // this is the basic description of our job
  EL::Job job;
  job.sampleHandler (sh); // use SampleHandler in this job
  job.options()->setDouble (EL::Job::optMaxEvents, 500); // for testing purposes, limit to run over the first 500 events only!
  job.options()->setString( EL::Job::optSubmitDirMode, "unique-link");

  // add our algorithm to the job
  EL::AnaAlgorithmConfig alg;
  alg.setType ("MyxAODAnalysis");

  // set the name of the algorithm (this is the name use with
  // messages)
  alg.setName ("AnalysisAlg");

  // later on we'll add some configuration options for our algorithm that go here

  job.algsAdd (alg);

  // make the driver we want to use:
  // this one works by running the algorithm directly:
  EL::DirectDriver driver;
  // we can use other drivers to run things on the Grid, with PROOF, etc.

  // process the job using the driver
  driver.submit (job, submitDir);
}

Read over the comments carefully to understand what is happening. Notice that we will only run over the first 500 events (for testing purposes). Obviously if you were doing a real analysis you would want to remove that statement to run over all events in a sample.

tip If you are used to C++ you may wonder why the code above does not have any include statements. This is because it is run in interpreted root (as opposed to compiled root). In older releases it was OK to add include statements to macro, but with newer releases there seem to be problems, so we removed them from the steering macro. If instead you want to compile the macro you will need to add include statements, but there is little to be gained by compiling your macro, so we don’t cover that in this tutorial.

OK, now the big moment has come. Within your run directory execute your ATestRun_eljob.cxx macro with root. Note that you should always first call the load_packages.C macro to ensure that root is setup with all extra ATLAS functionality:

cd ../run
root -b -q '$ROOTCOREDIR/scripts/load_packages.C' '../source/MyAnalysis/share/ATestRun_eljob.cxx ("submitDir")'

tip Note that you need to use load_packages.C when using Python/PyROOT.

tip

If your algorithm does not run, make sure that you have defined the environment variable ALRB_TutorialData, as explained here.