Running on the grid

Last update: 03 Apr 2019 [History] [Edit]

In this section of the tutorial we will teach you how to run on the grid. There are two main advantages to running on the grid: First, you have access to the vast computing power of the grid, which means even very lengthy jobs can finish within hours, instead of days or weeks. Second, you have direct access to all the datasets available on the grid, which means you don’t need to download them to your site first, which can save you hours if not days of time. There are also two main disadvantages: First, for all but the simplest jobs your turnaround time will be measured in hours, if not days. And this time can vary depending on the load at the various grid sites. Second, there are more things beyond your control that can go wrong, e.g. the only grid site with your samples may experience problems and go offline for a day or two thereby delaying the execution of your jobs.

Users of AthAnalysis or any other Athena project can learn how to submit jobs to the grid at this link

As a first step, set up the Panda client which will be needed for running on the grid. You should set these up before setting up root, so it’s probably best to start from a clean shell and issue the commands:

setupATLAS
lsetup panda 

Now navigate to your working area, and setup your Analysis Release, following the recommendations above What to do everytime you log in

The nice thing about using EventLoop is that you don’t have to change any of your algorithms code, we simply change the driver in the steering macro. It is recommended that you use a separate submit script when running on the grid.

Let’s copy the content of ATestRun_eljob.cxx all the way up to, and including, the driver.submit statement into a new file ATestSubmit.cxx. Don’t forget to change the name of the macro at the beginning of the file to ATestSubmit:

void ATestSubmit (const std::string& submitDir)

If you did the section on direct access of file on the grid the configuration for SampleHandler should already be set, otherwise open ATestSubmit.cxx and comment out the directory scan and instead scan using DQ2 (shown below). Note since we are just testing this functionality we will use a very small input dataset (a SUSY signal point) so your job will run quickly and you can have quick feedback regarding the success (let’s hope it’s a success) of your job.

  // const char* inputFilePath = gSystem->ExpandPathName ("$ALRB_TutorialData/r9315/");
  // SH::ScanDir().filePattern("AOD.11182705._000001.pool.root.1").scan(sh,inputFilePath);


  SH::scanRucio (sh, "data16_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_ZMUMU.repro21_v01/");

You will also need to add

#include <SampleHandler/ToolsDiscovery.h>

at the beginning of the file.

Next, replace the driver with the PrunDriver:

//EL::DirectDriver driver;
EL::PrunDriver driver;

We actually need to specify a structure to the output dataset name, as our input sample has a really really long name, and by default the output dataset name will contain (among other strings) this input dataset name which is too long for the Grid to handle. So after you’ve defined this PrunDriver add:

driver.options()->setString("nc_outputSampleName", "user.lheelan.test.%in:name[2]%.%in:name[6]%");

where you should replace lheelan with your Grid nickname (usually same as your lxplus username), the argument [2] will put the dataset ID (MC ID or run number) and [6] will put the AMI tags (basically we are removing this “physics short” text). Note that if you wanted to rerun this again with the same input but a slightly different analysis setting you would need to come up with a different output dataset name (or you will get an error telling you this output dataset name already exists). You need to have unique dataset names.

The PrunDriver supports a number of optional configuration options that you might recognize from the prun program. If you want detailed control over how jobs are submitted, please consult this page for a list of options: GridDriver

Finally, submit the jobs as before:

root -l -b -q '$ROOTCOREDIR/scripts/load_packages.C' 'ATestSubmit.cxx ("myGridJob")'

This job submission process may take a while to complete - do not interrupt it! You will be prompted to enter your Grid certificate password. When all jobs are submitted to panda, a monitoring loop will start. Output histograms for a each input dataset will be downloaded as soon as processing of that dataset is completed. You can stop the monitoring loop with Ctrl+c and restart it later by calling

  EL::Driver::wait("myGridJob");

or do a single check on the status of the jobs and retrieve and new output:

  EL::Driver::retrieve("myGridJob");

See [here](https://twiki.cern.ch/twiki/bin/viewauth/AtlasProtected/EventLoop#Submitting_a_Job_in_Batch_Mode] for more info on how to create a separate retrieve script which could contain any post-processing code that you want to run once the jobs are finished. If you do not want to enter the monitoring loop in the first place you can, at the end of ATestSubmit.cxx, replace driver.submit with:

  // submit the job, returning control immediately if the driver supports it
  driver.submitOnly (job, submitDir);
}

If you need to log out from your computer but you still want output to be continuously downloaded so that it is immediately available when you come back, a somewhat more advanced GridDriver exists which will use Ganga and GangaService to keep running in the background, see the EventLoop twiki page for more info.

You can follow the evolution of your jobs by going to https://bigpanda.cern.ch, clicking on users and then finding yourself in the list.

If you are using the compiled application to run your macro, you need to add two things:

  • to MyAnalysis/CMakeLists.txt add EventLoopGrid to the list of LINK_LIBRARIES

  • in your compiled steering macro MyAnalysis/util/testRun.cxx add this near the top with the other header include statements: #include "EventLoopGrid/PrunDriver.h"

If you need more information on options available for running on the grid, check out the grid driver documentation.