In this section of the tutorial we will teach you how to run on the grid. There are two main advantages to running on the grid: First, you have access to the vast computing power of the grid, which means even very lengthy jobs can finish within hours, instead of days or weeks. Second, you have direct access to all the datasets available on the grid, which means you don’t need to download them to your site first, which can save you hours if not days of time. There are also two main disadvantages: First, for all but the simplest jobs your turnaround time will be measured in hours, if not days. And this time can vary depending on the load at the various grid sites. Second, there are more things beyond your control that can go wrong, e.g. the only grid site with your samples may experience problems and go offline for a day or two thereby delaying the execution of your jobs.
Users of AthAnalysis or any other Athena project can learn how to submit jobs to the grid at this link
As a first step, set up the Panda client which will be needed for running on the grid. You should set these up before setting up root, so it’s probably best to start from a clean shell and issue the commands:
setupATLAS
lsetup panda
Now navigate to your working area, and setup your Analysis Release, following the recommendations above What to do everytime you log in
The nice thing about using EventLoop is that you don’t have to change any of your algorithms code, we simply change the driver in the steering script. It is recommended that you use a separate submit script when running on the grid.
Let’s copy the content of ATestRun_eljob.py
all the way up to, and
including, the driver.submit()
statement into a new file
ATestSubmit.py
.
If you did the section on direct access of file on the grid the
configuration for SampleHandler should already be set, otherwise open
ATestSubmit.py
and comment out the directory scan and instead scan
using Rucio (shown below). Note since we are just testing this
functionality we will use a very small input dataset
so your job will run quickly and you can have quick feedback
regarding the success (let’s hope it’s a success) of your job.
# inputFilePath = os.getenv( 'ALRB_TutorialData' ) + '/mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3126_r10201_p4172/'
# ROOT.SH.ScanDir().filePattern( 'DAOD_PHYS.21569875._001323.pool.root.1' ).scan( sh, inputFilePath )
ROOT.SH.scanRucio(sh, 'data16_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_ZMUMU.repro21_v01/' )
Next, replace the driver with the PrunDriver
:
# driver = ROOT.EL.DirectDriver()
driver = ROOT.EL.PrunDriver()
We actually need to specify a structure to the output dataset name, as
our input sample has a really really long name, and by default the
output dataset name will contain (among other strings) this input
dataset name which is too long for the Grid to handle. So after you’ve
defined this PrunDriver
add:
driver.options().setString("nc_outputSampleName", "user.<nickname>.grid_test_run")
where you should replace <nickname>
with your Grid nickname (usually
same as your lxplus username), the argument [2]
will put the dataset ID
(MC ID or run number) and [6]
will put the AMI tags (basically we are
removing this “physics short” text). Note that if you wanted to rerun
this again with the same input but a slightly different analysis
setting you would need to come up with a different output dataset name
(or you will get an error telling you this output dataset name already
exists). You need to have unique dataset names.
The PrunDriver
supports a number of optional configuration options
that you might recognize from the prun
program.
Finally, submit the jobs as before:
ATestSubmit.py --submission-dir=submitDir
This job submission process may take a while to complete - do not interrupt it! You will be prompted to enter your Grid certificate password.
You can follow the evolution of your jobs by going to https://bigpanda.cern.ch/user/.
If you need more information on options available for running on the grid, check out the grid driver documentation.