EventLoop Grid Driver

Last update: 26 Oct 2023 [History] [Edit]

The basic solution for running on the grid is the PrunDriver. To use it you must first set up panda and rucio clients:

    lsetup rucio
    lsetup panda

Note: At least the dq2 part should normally be done before setting up ROOT or an ASG release or there is a risk of configuration clashes.

To submit jobs to the grid, create an instance of the PrunDriver.

    EL::PrunDriver driver;

Optionally, you can specify how to name the grid output datasets. The naming is based on a simple rule, which you specify like so:

    driver.options()->setString("nc_outputSampleName", "user.amadsen.test.%in:name[2]%");

This string should always begin with user.yourgridnickname. to be consistent with rucio naming rules. The rest of the string is arbitrary, and some substitutions can be used to derive the name from each input sample. %nickname% will be replaced with your grid nickname. %in:name% will be replaced with the name of the input sample. %in:name[n]% will be replaced with the n-th field of the input name, split by .. %in:metastring% will be replaced with the value of the (string) meta data field metastring of the input sample.

For example, using the string above user.amadsen.test.%in:name[2]%, the output sample created from the input sample mc11_7TeV.105200.T1_McAtNlo_Jimmy.merge.NTUP_TOP.e835_s1272_s1274_r3043_r2993_p834 will be called user.amadsen.test.105200.

Job configuration is done using the meta data system, so options can be set on a per sample basis:

driver.options()->setString(EL::Job::optGridNFilesPerJob,  "MAX"); //By default, split in as few jobs as possible
sh.get("data12_8TeV.00202668.physics_Muons.merge.NTUP_COMMON.r4065_p1278_p1562/")->SetMetaDouble(EL::Job::optGridNFilesPerJob, 1); //For this particular sample, split into one job per input file
driver.options()->setDouble(EL::Job::optGridMergeOutput, 1); //run merging jobs for all samples before downloading (recommended) 

The full list of supported options can be found in the EL::Job documentation here (look for variables starting with optGrid). For full explanation of each option, see the prun documentation (prun --help).

The grid drivers work with SampleGrid samples. A scanDQ2() function is available to create these:

SH::SampleHandler sh;
SH::scanDQ2 (sh, "user.krumnack.pat_tutorial_*.v1");
sh.setMetaString ("nc_tree", "CollectionTree");

Please see the SampleHandler documentation for more information. Note that you can specify a subset of files in a dataset or container by setting the meta data string nc_grid_filter to for example “.root” to process only root files in a dataset also containing log files. (The last wildcard is significant as files may often be named e.g. something.root.1)

Create your Job object as usual and then submit it:

driver.submit(job, "uniqueJobDirectory");

Passing Non-Standard Options to the Grid Driver

In case the option you need to use is not available as an explicit option, you can pass it as a generic option:

job.options()->setString (EL::Job::optSubmitFlags, "-x -y -z");

For options that are supported by EventLoop it is preferred to pass them via the explicit option instead of the generic mechanism, as it makes EventLoop aware of what options you chose and gives it the opportunity to do extra actions (if required).

Processing multiple datasets in one JEDI task

Note that Panda accepts a comma separated list of datasets as input. This allows us to speed up job submission of multiple datasets that should all be processed with the same meta data. To set up such a task, you can do:

std::unique_ptr<SH::SampleGrid> sample(new SH::SampleGrid("AllMyData"));
sample->meta()->setString(SH::MetaFields::gridName, "data15_13TeV.periodA-J.physics_Main.PhysCont.DAOD_EXOT14.grp15_v01_p9999,data16_13TeV.periodA-L.physics_Main.PhysCont.DAOD_EXOT14.grp16_v01_p9999");
sample->meta()->setString(SH::MetaFields::gridFilter, SH::MetaFields::gridFilter_default); sh.add(sample.release());

where sh is your sample handler. This should be sufficient for data, but for MC samples we usually want a way to keep track of which output came from which input sample. With the PrunDriver we can use the option described above to pass in some extra flags that will help with that:

job.options()->setString (EL::Job::optSubmitFlags, "--addNthFieldOfInDSToLFN=1,2,3 --useContElementBoundary");

Here, useContElementBoundary ensures that only files that come from the same input dataset are processed together, and the numbers after addNthFieldOfInDSToLFN will add (in this case) the first, second and third part of the name of that input dataset to the names of the produced output files. Note that this only really makes sense if you submitOnly(), as the retrieve() command would just add all the histograms back together again.