To actually run this algorithm we need some way to configure
and steer the job. When running in EventLoop, this is done in
the form of a steering macro. When running in Athena, this is
done with a jobOptions file, similar to what was used in the
MC Generation section. A steering macro and
a jobOptions file are both included in the MyAnalysis
repository. This section will walk you through understanding
and running both, but it is recommended to use EventLoop for
your first time through this tutorial.
Your steering macro and/or jobOptions can be stored anywhere, but it is good practice to keep them in your package, typically under
share
.
tutorial/AnalysisTutorial/source/MyAnalysis
The steering macro can be a root macro in C++ or python or
compiled C++ code. The latest recommendations, which are
followed in this tutorial, are to use a python macro. The
macro for our algorithm can be seen
here or in your
local version of MyAnalysis
as share/ATestRun_eljob.py
.
This section highlights some important parts of the macro
and further exploration is left as an exercise for the reader.
The macro is called share/ATestRun_eljob.py
.
In order for it to be called directly, it needs to be executable.
The permissions are set correctly for ATestRun_eljob.py
, but if
you make another similar macro, you need to call
chmod +x share/<macro_name>.py
.
The following line in
CMakeLists.txt
adds the macro to$PATH
so it can be called from the command line with justATestRun_eljob.py
:atlas_install_scripts( share/*_eljob.py )
The first part of the macro we will look at is getting the input file(s). This is done using SampleHandler, a tool that provides numerous methods of defining and finding input files. The implementation used in our example creates a local sample object using the filename directly:
# Set up the SampleHandler object to handle the input files
sh = ROOT.SH.SampleHandler()
# Set the name of the tree in our files in the xAOD the TTree
# containing the EDM containers is "CollectionTree"
sh.setMetaString( 'nc_tree', 'CollectionTree' )
# Select the sample associated with the data type used
if dataType not in ["data", "mc"] :
raise Exception (f"invalid data type: {dataType}")
if dataType == 'mc':
testFile = os.getenv ('ALRB_TutorialData')+'/mc20_13TeV.312276.aMcAtNloPy8EG_A14N30NLO_LQd_mu_ld_0p3_beta_0p5_2ndG_M1000.deriv.DAOD_PHYS.e7587_a907_r14861_p6117/DAOD_PHYS.37791038._000001.pool.root.1'
else:
testFile = os.getenv('ASG_TEST_FILE_DATA')
# Use SampleHandler to get the sample from the defined location
sample = ROOT.SH.SampleLocal("dataset")
sample.add (testFile)
sh.add (sample)
You can add multiple input files to your job by repeating the
add
command.
SampleHandler
offers several methods to add files for local running, including an option (ScanDir
) to scan a directory and find files matching a pattern. More details are available here.
The macro allows you to set parameters for your job, such as whether you are running over Monte Carlo or detector data. This is done in the macro with the lines:
# Set data type
dataType = "mc" # "mc" or "data"
Next, the job is created, the SampleHandler
object is added to
it, and some options are specified:
# Create an EventLoop job.
job = ROOT.EL.Job()
job.sampleHandler( sh )
# Add some options for the job
job.options().setDouble( ROOT.EL.Job.optMaxEvents, 500 )
job.options().setString( ROOT.EL.Job.optSubmitDirMode, 'unique-link')
The first option tells the job to run over only 500 events, which is useful for testing purposes but not for running actual analysis jobs. When running over full datasets, this option should be set to -1 to indicate no event number limit, or removed to use the default behavior of no event number limit.
The second option modifies the naming convention for the output
directory from your job. The unique-link
option causes a unique
timestamp to be appended to the output directory name and a link
is created to point at the latest directory. This prevents your
outputs from being overwritten the next time the job is re-run.
The
unique-link
option is useful for local testing, but not for full production runs. Turn off this behavior by setting the option tono-clobber
. If the specified output directory already exists, the job will fail.
Finally, the driver is specified and the job is submitted:
# Run the job using the direct driver.
driver = ROOT.EL.DirectDriver()
driver.submit( job, options.submission_dir )
Make sure that anything you want to do to configure your job is added before these lines submitting the job, otherwise they it won’t be picked up.
In this case, we are using the direct driver to run locally. Other drivers (such as for running on batch systems or the grid) are also available. More details about the available drivers can be found in the Analysis Tools guide.
Read through the macro carefully to understand what it is doing.
Finally, run a quick test to prove to yourself that this job works!
cd ../run
ATestRun_eljob.py
One of the last messages you see in the terminal should say “worker finished successfully”.
The jobOptions used to run your algorithm follow the same principles as the jobOptions you used for MC event generation, but are different because this is a fundamentally different use-case. You can find more details about using a jobOptions file for running analyses in Athena in the main Athena tutorial.
The jobOptions for this tutorial are available here
The jobOptions file is called share/ATestRun_jobOptions.py
.
It can be kept anywhere, but it is a good idea to keep it in your source
area, probably in your package.
The following line in
CMakeLists.txt
enables the use of the jobOptions by adding them to the$JOBOPTSEARCHPATH
:atlas_install_joboptions( share/*_jobOptions.py )
The first part we will look at is getting the input file(s):
# Select the sample associated with the data type used
if dataType not in ["data", "mc"] :
raise Exception (f"invalid data type: {dataType}")
if dataType == 'mc':
testFile = os.getenv ('ALRB_TutorialData')+'/mc20_13TeV.312276.aMcAtNloPy8EG_A14N30NLO_LQd_mu_ld_0p3_beta_0p5_2ndG_M1000.deriv.DAOD_PHYS.e7587_e7400_a907_r14861_r14919_p6026/DAOD_PHYS.37773721._000001.pool.root.1'
else:
testFile = os.getenv('ASG_TEST_FILE_DATA')
# Override next line on command line with: --filesInput=XXX
jps.AthenaCommonFlags.FilesInput = [testFile]
Next, the algorithm is added to the Athena sequence (this is analogous to submitting an EventLoop job to a driver):
# Add our algorithm to the main alg sequence
athAlgSeq += alg
Finally, some options are specified:
# Limit the number of events (for testing purposes)
theApp.EvtMax = 500
# Optional include for reducing printout from athena
include("AthAnalysisBaseComps/SuppressLogging.py")
Read through the jobOptions carefully to understand what it is doing.
Finally, run a quick test to prove to yourself that this job works!
cd ../run
athena MyAnalysis/ATestRun_jobOptions.py - -c "../source/MyAnalysis/data/config.yaml"
This uses a default configuration provided in the MyAnalysis package. The last message you see in the terminal should say “successful run”.