Running on the grid

Last update: 14 Nov 2022 [History] [Edit]

In this section of the tutorial we will teach you how to run on the grid. There are two main advantages to running on the grid: First, you have access to the vast computing power of the grid, which means even very lengthy jobs can finish within hours, instead of days or weeks. Second, you have direct access to all the datasets available on the grid, which means you don’t need to download them to your site first, which can save you hours if not days of time. There are also two main disadvantages: First, for all but the simplest jobs your turnaround time will be measured in hours, if not days. And this time can vary depending on the load at the various grid sites. Second, there are more things beyond your control that can go wrong, e.g. the only grid site with your samples may experience problems and go offline for a day or two thereby delaying the execution of your jobs.

Users of AthAnalysis or any other Athena project can learn how to submit jobs to the grid at this link

As a first step, set up the Panda client which will be needed for running on the grid. You should set these up before setting up root, so it’s probably best to start from a clean shell and issue the commands:

setupATLAS
lsetup panda 

Now navigate to your working area, and setup your Analysis Release, following the recommendations above What to do everytime you log in

The nice thing about using EventLoop (and Athena) is that you don’t have to change any of your algorithms code, we simply change the driver in the steering script. It is recommended that you use a separate submit script when running on the grid.

Let’s copy the content of ATestRun_eljob.py all the way up to, and including, the driver.submit() statement into a new file ATestSubmit.py. In the new file, change submit() to submitOnly().

If you did the section on direct access of file on the grid the configuration for SampleHandler should already be set, otherwise open ATestSubmit.py and comment out the directory scan and instead scan using Rucio (shown below). Note since we are just testing this functionality we will use a very small input dataset so your job will run quickly and you can have quick feedback regarding the success (let’s hope it’s a success) of your job.

#sample = ROOT.SH.SampleLocal("dataset")
#sample.add (os.getenv ('ASG_TEST_FILE_MC'))  
#sh.add (sample)

ROOT.SH.scanRucio(sh, ' mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3681_r13167_p5169/' )

Next, replace the driver with the PrunDriver:

# driver = ROOT.EL.DirectDriver()
driver = ROOT.EL.PrunDriver()

We actually need to specify a structure to the output dataset name, as our input sample has a really really long name, and by default the output dataset name will contain (among other strings) this input dataset name which is too long for the Grid to handle. So after you’ve defined this PrunDriver add:

driver.options().setString("nc_outputSampleName", "user.<nickname>.grid_test_run")

where you should replace <nickname> with your Grid nickname (usually same as your lxplus username).

The PrunDriver supports a number of optional configuration options that you might recognize from the prun program. If you want detailed control over how jobs are submitted, please consult this page for a list of options: GridDriver

Finally, submit the jobs as before:

ATestSubmit.py --submission-dir=submitDir

This job submission process may take a while to complete - do not interrupt it! You will be prompted to enter your Grid certificate password.

If you need to log out from your computer but you still want output to be continuously downloaded so that it is immediately available when you come back, a somewhat more advanced GridDriver exists which will use Ganga and GangaService to keep running in the background, see the EventLoop twiki page for more info.

When running on the grid, change the submit() command in your steering macro to submitOnly(). If you use submit(), the script will wait until the jobs are finished, which we don’t want.

You can follow the evolution of your jobs by going to https://bigpanda.cern.ch/user/.

If you need more information on options available for running on the grid, check out the grid driver documentation.