CPGridRun.py` is a script to submit the analysis job to the PanDA grid (we call the remote computing service a grid job) and you can monitor it on bigPanDA. You want to submit a job when your root files are too big for a local machine, or you are working with officially produced MC samples by ATLAS production team.
Submitting a grid job yourself has a steep learning curve because you are opened up to a whole set of grid errors, which most of the time you will be swammed by the computing server technicalities while debugging. CPGridRun.py
is a centralized script to help you submit the job in a working and suggested way. The script has a lot default settings, in particular, the script is designed to streamline with CPRun.py
. In this section we focus on running CPGridRun.py
with CPRun.py
. The core of the CPGridRun.py
is generating a working prun
(PanDA run) command.
Lets run a demonstration first!
setupATLAS
asetup AnalysisBase,main,latest
touch gridinput.txt
echo "mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3681_r13167_r13146_p6490" >> gridinput.txt
echo "mc20_13TeV.700341.Sh_2211_Wmunu_maxHTpTV2_BFilter.deriv.DAOD_PHYS.e8351_s3681_r13145_p6490" >> gridinput.txt
After setting up and created an input text file, run
CPGridRun.py -i input.txt --testRun --exec "-t test_configuration_Run2.yaml -e 50" --prefix myTutorial
You should see
Py:CPGridRun INFO
Input: mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3681_r13167_r13146_p6490
Datasetname: mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3681_r13167_r13146_p6490
Projectname: mc20_13TeV
Campaign: mc20
Energy: 13TeV
Dsid: 410470
Main: PhPy8EG_A14_ttbar_hdamp258p75_nonallhad
Step: deriv
Format: DAOD_PHYS
Tags: ['e6337', 's3681', 'r13167', 'r13146', 'p6490']
Etag: e6337
Stag: s3681
Rtag: r13146
Ptag: p6490
Py:CPGridRun INFO Command:
...
You should see the first part is about metadata of your input sample, for the detail check the [ATLAS Production naming format](#atlas-production-naming-format-optional) section below.
The second part starts with `prun` command, which is the grid submission command you just learned in the previous tutorial. `CPGridRun.py` is generating a working `prun` command for you to run your CP algorithms on the grid with `CPRun`.
.....
Py:CPGridRun INFO Command:
prun \
--inDS mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.deriv.DAOD_PHYS.e6337_s3681_r13167_r13146_p6490 \
--outDS user.$USER.myTutorial.410470.DAOD_PHYS.e6337_s3681_r13167_r13146_p6490.test_214093 \
--useAthenaPackages \
--cmtConfig x86_64-el9-gcc13-opt \
--writeInputToTxt IN:in.txt \
--outputs output:output.root \
--exec "CPRun.py --input-list in.txt --output-name output --max-events 50 --text-config test_configuration_Run2.yaml --merge-output-files" \
--memory 2000 \
--addNthFieldOfInDSToLFN 2,3,6 \
--mergeOutput \
--outTarBall cpgrid.tar.gz \
--nEventsPerFile 300 \
--nFiles 10
This is a working prun
command line that you can copy and paste on lxplus; of course you can also use CPGridRun.py
to run the command line for you. There are a few flags we should discuss.
--outDS user.$USER.myTutorial.410470.DAOD_PHYS.e#####.test_#####
we see the user identity(user or group), username is set, followed by the prefix myTutorial
. At the end, the suffix is test_#####
, it is set automatically because we passed --test
--exec "CPRun.py --input-list in.txt --output-name output --max-events 50 --text-config test_configuration_Run2.yaml --merge-output-files"
--exec
is different from what we have entered, CPGridRun
will help you to set the input and output correctly, and make sure the necessary flags are set.--input-list
to in.txt
, you may have found it is from --writeInputToTxt IN:in.txt
. After the grid receive the MC samples you requested, it will read through its database, and find out all the related .root
files, and write it into in.txt
; which a format that CPRun.py
can take.--outTarBall
is asking prun
to (re)compress the repository to cpgrid.tar.gz
, if you see --inTarBall
it means it uses cpgrid.tar.gz
but not re-compressing.--nEventsPerFile 300 & --nFiles 10
because we have --testRun
enabled. Sometimes you want to test your code on the grid, but you don’t want to wait for a long time to get the results. --testRun
will limit the number of files per job to 10 and number of events per file to 300. This is useful when you want to test a small run on the grid.At the end you will see a confirmation prompt, press y
and this will be sufficient to submit a job to the grid.
One challenge to setup properly is to get the correct formatting on the grid.
The input name has a format which the ATLAS Production team uses to name the samples they produced. Getting the name correct is crucial because it is the name used on the grid, and it is a format that CPGridRun.py
can recognize and help streamlining.
The ATLAS Production naming format as follow:
mc##_%%TeV
or data_##
.deriv
stands for derivation, simul
, evgen
, recon
etc.AOD
, EVNT
, etc.The full format usually follows:
ProjectName.DSID.Main.Step.FORMAT.tags
Let see the help message
setupATLAS
asetup AnalysisBase,main,latest
CPGridRun.py -h
There are two main sections, one is the CPGridRun.py
arguments, the other is extracted from CPRun.py
.
Under the CPGridRun.py
section, it is divided into 4 subsections. You will also see some arguments help message have “(PanDA)”, which means it is an identical flag taken from prun
.
-i
or --input-list
, it is NOT identical to the CPRun.py
input list. It takes two formats,
--output-files
, on the grid NOT all files generated can be downloaded because it takes extra effort for the grid to collect your files to a desired location from multiple computing servers. Users need to notify the grid what to download in advance. --output-files "A.root,B.txt,B.root" results in outDS/A/A.root, outDS/B/B.txt, outDS/B/B.root
in the output directory. If you are using CPRun.py you don’t need to set it.Each time a user submit a grid job they must have a unique outDS
. The outDS
is a unique identifier for the grid, and every specified file will be put under the directory outDS
. If a duplicated outDS
is submitted to the grid, the grid will return an error and asking you to change the outDS
, even your previous submission with the same outDS has FAILED. We offer a preset (that is commonly used) to simplify the process.
outDS preset: {group/user}.{username}.{prefix}.{DSID}.{format}.{tags}.{suffix}
username
is obtained automatically, DSID
, format
, tags
is derived from your input samples. User only need to set the prefix
and suffix
--prefix
Normally a fixed name that user wants to keep using for that sample, for example ttbar2WWnunu
--suffix
Mainly for version control, a name that user is happy to change for unique outDS, like test_v1
, v_05
etc. If a submission failed for v_03
, user can change the suffix to v_04
and submit again--outDS
User can override all the preset and set it manually.--gridUsername
it is obtained automatically for single user. If the user is submitting an official group production, user can set it to --gridUsername PHYS-HMBS
etc.--groupProduction
will enable some preset for the group production, including naming and computation resources arrangement. User is expected to have the proper authentication.
--exec
The executive line that user want to run on the grid. Must encapsulate in double quote “”. There are a few things user should know before using the CPRun.py
preset
--exec "-t analysis_config.yaml --no-systematics -e 50"
--exec "customRun.py -i inputs -o output --text-config config.yaml --flagA --flagB"
--noSubmit
will NOT submit anything to the grid--testRun
will submit jobs to the grid with a random suffix .test_uuid
. It will also greatly limit the number of files per job (10) and number of events (300). It is useful when you want to test a small run on the grid.
--recreateTar
During submission with prun
, user required to manually ask prun
to compress the user’s repository with its source code, and submit alongside to the grid. We found that users always forget to re-compress after updating the source code (which always takes a few hours before users realized this mistake), therefore CPGridRun.py
has a file changes detection to detect if anything changed in the source code or build directory. If so CPGridRun.py
will ask prun
to compress again. But user can force re-compression with this flag.