Run Your Algorithm On The Grid

Last update: 28 Oct 2023 [History] [Edit]

You can run your algorithm on the Grid through built-in functionality.

In a new shell, navigate to AnalysisTutorial. Set up your analysis release (asetup) and additionally setup panda.

The nice thing about using EventLoop (and Athena) is that you don’t have to change any of your algorithms code, we simply change the driver in the steering script. It is recommended that you use a separate submit script when running on the Grid. Let’s copy the content of ATestRun_eljob.py into a new file ATestSubmit_eljob.py.

First, we need to tell SampleHandler how to find the input file(s) on the Grid. In ATestSubmit_eljob.py, comment out the directory scan and instead scan using Rucio.

tip Since we are just testing this functionality we will use a very small input dataset so your job will run quickly and you can have quick feedback regarding the success (let’s hope it’s a success!) of your job.

# sample = ROOT.SH.SampleLocal("dataset")
# sample.add (os.getenv ('ASG_TEST_FILE_MC'))  
# sh.add (sample)

ROOT.SH.scanRucio(sh , 'mc20_13TeV.601981.PhH7EG_NLO_LQ_S43_ResProd_lam11_1000_0p5.deriv.DAOD_PHYS.e8531_s3797_r13167_p5631')

Next, replace the driver with the PrunDriver:

# driver = ROOT.EL.DirectDriver()
driver = ROOT.EL.PrunDriver()

We actually need to specify a structure to the output dataset name, as our input sample has a really really long name, and by default the output dataset name will contain (among other strings) this input dataset name which is too long for the Grid to handle. So after you’ve defined this PrunDriver add:

driver.options().setString("nc_outputSampleName", "user.%nickname%.grid_test_run")

In the new file, change driver.submit() to driver.submitOnly(). This should again be the last line of the file.

tip If you use the submit() command, the script will wait until the Grid jobs are finished, which we don’t want. The submitOnly() command will launch the jobs and then return control.

The PrunDriver supports a number of optional configuration options that you might recognize from the prun program. If you need more information on options available for running on the Grid, check out the Grid driver documentation.

tip For example, a useful (and actually recommended) option is to set merging jobs to run on the Grid so the output is a single file. This can be important in cases you run over a large input dataset and Grid splits the processing task into multiple jobs.

driver.options().setString( "nc_mergeOutput", "true" )

Finally, submit the jobs as before:

ATestSubmit_eljob.py --submission-dir=submitDir

This job submission process may take a while to complete - do not interrupt it! You will be prompted to enter your Grid certificate password.

Running with Athena

If you are using Athena, you can learn how to submit jobs to the Grid using pathena at this link