To actually run this algorithm we need some way to configure
and steer the job. When running in EventLoop, this is done in
the form of a steering macro. When running in Athena, this is
done with a jobOptions file, similar to what was used in the
MC Generation section. A steering macro and
a jobOptions file are both included in the AnalysisTutorial
repository. This section will walk you through understanding
and running both, but it is recommended to use EventLoop for
your first time through this tutorial.
Your steering macro and/or jobOptions can be stored anywhere, but it is good practice to keep them in your package, typically under
share
.
The steering macro can be a root macro in C++ or python or
compiled C++ code. The latest recommendations, which are
followed in this tutorial, are to use a python macro. The
macro for our algorithm can be seen
here or in your
local version of MyAnalysis
as share/ATestRun_eljob.py
.
This section highlights some important parts of the macro
and further exploration is left as an exercise for the reader.
The macro is called source/MyAnalysis/share/ATestRun_eljob.py
.
In order for it to be called directly, it needs to be executable.
The permissions areset correctly for ATestRun_eljob.py
, but if
you make another similar macro, you need to call
chmod +x source/MyAnalysis/share/<macro_name>.py
.
The following line in
MyAnalysis/CMakeLists.txt
enables the use of the macro:atlas_install_scripts( share/*_eljob.py )
The first part of the macro we will look at is getting the input
file(s). This is done using
SampleHandler,
a tool that provides numerous methods of defining and finding
input files. The implementation used in our example creates a local
sample object using the filename directly (currently stored as the
environment variable ASG_TEST_FILE_MC
):
# Set up the SampleHandler object to handle the input files
sh = ROOT.SH.SampleHandler()
# Set the name of the tree in our files in the xAOD the TTree
# containing the EDM containers is "CollectionTree"
sh.setMetaString( 'nc_tree', 'CollectionTree' )
# Use SampleHandler to get the sample from the defined location
sample = ROOT.SH.SampleLocal("dataset")
sample.add (os.getenv ('ASG_TEST_FILE_MC'))
sh.add (sample)
The astute observer may note that
ASG_TEST_FILE_MC
points to a ttbar sample. This is due to a small issue with the existing LQ signal samples. We are working on fixing the issue and will integrate them into this step in the tutorial as soon as possible. In the meantime, please use the available ttbar sample. The methods taught in this part of the tutorial are independent of the input sample.
You can add multiple input files to your job by repeating the
add
command.
![]()
SampleHandler
offers several methods to add files for local running, including an option (ScanDir
) to scan a directory and find files matching a pattern. More details are available here.
The macro allows you to set parameters for your job, such as whether you are running over Monte Carlo or detector data. This is done in the macro with the lines:
# Set data type
dataType = "mc" # "mc" or "data"
Next, the job is created, the SampleHandler
object is added to
it, and some options are specified:
# Create an EventLoop job.
job = ROOT.EL.Job()
job.sampleHandler( sh )
# Add some options for the job
job.options().setDouble( ROOT.EL.Job.optMaxEvents, 500 )
job.options().setString( ROOT.EL.Job.optSubmitDirMode, 'unique-link')
The first option tells the job to run over only 500 events, which is useful for testing purposes but not for running actual analysis jobs. When running over full datasets, this option should be set to -1 to indicate no event number limit, or removed to use the default behavior of no event number limit.
The second option modifies the naming convention for the output
directory from your job. The unique-link
option causes a unique
timestamp to be appended to the output directory name and a link
is created to point at the latest directory. This prevents your
outputs from being overwritten the next time the job is re-run.
The
unique-link
option is useful for local testing, but not for full production runs. Turn off this behavior by setting the option tono-clobber
. If the specified output directory already exists, the job will fail.
Finally, the driver is specified and the job is submitted:
# Run the job using the direct driver.
driver = ROOT.EL.DirectDriver()
driver.submit( job, options.submission_dir )
Make sure that anything you want to do to configure your job is added before these lines submitting the job, otherwise they it won’t be picked up.
In this case, we are using the direct driver to run locally. Other drivers (such as for running on batch systems or the grid) are also available. More details about the available drivers can be found in the Analysis Tools guide.
Read through the macro carefully to understand what it is doing.
To execute the job using this script, go to your run
directory, and
execute your macro with the command:
ATestRun_eljob.py --submission-dir=submitDir
If your algorithm does not run, there could be an issue with the “Shebang” line. You can just override this and directly run with python using
python ../build/x86_64-centos7-gcc8-opt/bin/ATestRun_eljob.py --submission-dir=submitDir
.
While you are in principle free where you put your
submitDir
, avoid putting them into thesource
directory, as that is usually version controlled and you risk your data files being added to the repository (which is bad). Also avoid putting them into thebuild
directory, as you often want to keep the contents ofsubmitDir
around, while thebuild
directory should only contain files you don’t mind losing. Putting it inside therun
directory is a reasonable choice if you have enough space there, but if it ends up containing large files you may need to put it onto a separate data disk. If you run in batch you may also need to put it inside a directory that is accessible from the worker nodes.
The jobOptions used to run your algorithm follow the same principles as the jobOptions you used for MC event generation, but is different because this is a fundamentally different use-case. You can find more details about using a jobOptions for running analyses in Athena in the main Athena tutorial.
The jobOptions for this tutorial is available here
The jobOptions file is called MyAnalysis/share/ATestRun_jobOptions.py
.
It can be kept anywhere, but it is a good idea to keep it in your source
area, probably in your package.
The following line in
MyAnalysis/CMakeLists.txt
enables the use of the jobOptions:atlas_install_joboptions( share/*_jobOptions.py )
The first part we will look at is getting the input file(s):
# Specify local input file name
testFile = os.getenv('ASG_TEST_FILE_MC')
# Override next line on command line with: --filesInput=XXX
jps.AthenaCommonFlags.FilesInput = [testFile]
Next, the algorithm is added to the Athena sequence (this is analogous to submitting an EventLoop job to a driver):
# Add our algorithm to the main alg sequence
athAlgSeq += alg
Finally, some options are specified:
# Limit the number of events (for testing purposes)
theApp.EvtMax = 500
# Optional include for reducing printout from athena
include("AthAnalysisBaseComps/SuppressLogging.py")
Read through the jobOptions carefully to understand what it is doing.
Go to your run
directory and execute your jobOptions with Athena using
the following command:
athena MyAnalysis/ATestRun_jobOptions.py
You can override many of the options specified in the jobOptions when calling the athena command. For example, you can set the number of events to process with the
--evtMax
option (-1 is the default value and causes all events to be processed):athena MyAnalysis/ATestRun_jobOptions.py --evtMax=-1
Or you can override the input files used with the
--filesInput
option:athena MyAnalysis/ATestRun_jobOptions.py --filesInput=another.file.root