To actually run this EventLoop algorithm we need some steering code. This can be a root macro in either C++ or python or some compiled C++ code. For this tutorial we will focus on writing a python macro, as that will be required to include the common CP algorithms, but the other options are equally valid (if you don’t intend to use the common CP algorithms).
This is only needed when working in EventLoop, inside Athena this will not be used. As such if you know that you will never work in EventLoop you can leave this out (or add it later). However, if you expect never to run in EventLoop you may be better off using the native athena algorithms instead of the
AnaAlgorithm
class.
We will use another ASG tool called SampleHandler which is a nice tool that allows for easy sample management. In this example we will create and configure a SampleHandler object. We will specify the path to the main directory, under which there could be several subdirectories (typically representing datasets) and within those the individual input files. Here we will tell SampleHandler we are only interested in one input xAOD file (specified by the exact name, but wildcards are accepted to using several specific inputs). More information and options for using SampleHandler to ‘find’ your data is found on the dedicated SampleHandler wiki.
You can really put your steering code anywhere, but it is probably a
good idea to keep it in your source area (which is in version
control), probably even in your package (typically in the share
directory). However, for simplicity within this tutorial we will just
place it directly into the source
directory.
Create a file called source/MyAnalysis/share/ATestRun_eljob.py
, and make it
executable (chmod +x source/MyAnalysis/share/ATestRun_eljob.py
). Fill it with the
following:
#!/usr/bin/env python
# Read the submission directory as a command line argument. You can
# extend the list of arguments with your private ones later on.
import optparse
parser = optparse.OptionParser()
parser.add_option( '-s', '--submission-dir', dest = 'submission_dir',
action = 'store', type = 'string', default = 'submitDir',
help = 'Submission directory for EventLoop' )
( options, args ) = parser.parse_args()
# Set up (Py)ROOT.
import ROOT
ROOT.xAOD.Init().ignore()
# Set up the sample handler object. See comments from the C++ macro
# for the details about these lines.
import os
sh = ROOT.SH.SampleHandler()
sh.setMetaString( 'nc_tree', 'CollectionTree' )
inputFilePath = os.getenv( 'ALRB_TutorialData' ) + '/mc21_13p6TeV.601229.PhPy8EG_A14_ttbar_hdamp258p75_SingleLep.deriv.DAOD_PHYS.e8357_s3802_r13508_p5057/'
ROOT.SH.ScanDir().filePattern( 'DAOD_PHYS.28625583._000007.pool.root.1' ).scan( sh, inputFilePath )
sh.printContent()
# Create an EventLoop job.
job = ROOT.EL.Job()
job.sampleHandler( sh )
job.options().setDouble( ROOT.EL.Job.optMaxEvents, 500 )
job.options().setString( ROOT.EL.Job.optSubmitDirMode, 'unique-link')
# Create the algorithm's configuration.
from AnaAlgorithm.DualUseConfig import createAlgorithm
alg = createAlgorithm ( 'MyxAODAnalysis', 'AnalysisAlg' )
# later on we'll add some configuration options for our algorithm that go here
# Add our algorithm to the job
job.algsAdd( alg )
# Run the job using the direct driver.
driver = ROOT.EL.DirectDriver()
driver.submit( job, options.submission_dir )
Read over the comments carefully to understand what is happening. Notice that we will only run over the first 500 events (for testing purposes). Obviously if you were doing a real analysis you would want to remove that statement to run over all events in a sample.
The way of creating an algorithm we are showing you above is the dual-use way, i.e. it is the same in EventLoop and Athena. An alternative EventLoop-only way of creating the algorithm is to use:
from AnaAlgorithm.AnaAlgorithmConfig import AnaAlgorithmConfig alg = AnaAlgorithmConfig( 'MyxAODAnalysis/AnalysisAlg' )
Add the following lines to MyAnalysis/CMakeLists.txt
to enable the
use of your macro:
# Install files from the package:
atlas_install_scripts( share/*_eljob.py )
To make sure that the newly used file gets installed and can be found
we need to recompile (we need to call cmake
explicitly as we did
create a new file):
cd ../build/
cmake ../source/
make
Don’t forget to run source x86_64-*/setup.sh
To execute the job using this script, go to your run
directory, and
simply execute your macro:
cd ../run
ATestRun_eljob.py --submission-dir=submitDir
If your algorithm does not run, make sure that you have defined the environment variable
ALRB_TutorialData
, as explained here.
If it still doesn’t run, there is sometimes an issue with the “Shebang” line. You can just override this and directly run with python usingpython ../build/x86_64-centos7-gcc8-opt/bin/ATestRun_eljob.py --submission-dir=submitDir
Note that
submitDir
is the directory/location where the output of your job is stored. We set the mode for the directory to “unique-link”, which means that EventLoop will attach the date and time to that name to make it unique, and then create a link that points to the latest directory created. That way it is guaranteed that outputs from your job don’t get overwritten when you re-run your job, while at the same time making it easy for you to find the latest result.For test runs this is generally a good setup, for actual production runs you may want to put some more thought in how you organize your output directories. If you want to avoid appending a unique suffix to your directory name you can switch “unique-link” above with “no-clobber”, which will just take
submitDir
as the actual directory name, and fail if the directory already exists.
While you are in principle free where you put your
submitDir
, avoid putting them into thesource
directory, as that is usually version controlled and you risk your data files being added to the repository (which is bad). Also avoid putting them into thebuild
directory, as you often want to keep the contents ofsubmitDir
around, while thebuild
directory should only contain files you don’t mind losing. Putting it inside therun
directory is a reasonable choice if you have enough space there, but if it ends up containing large files you may need to put it onto a separate data disk. If you run in batch you may also need to put it inside a directory that is accessible from the worker nodes.