A Simple Grid Job

Last update: 23 Aug 2024 [History] [Edit]

The PanDA client package contains a number of tools you can use to submit and manage analysis jobs on PanDA.

While pathena is used to submit Athena user jobs to PanDA, more general jobs (e.g. ROOT and Python scripts) can be submitted to the grid by using prun.

Finally, pbook is a python-based bookkeeping tool for all PanDA analysis jobs.

Detailed information about each of the tools can be found in the above page.

Setup

If you are working on lxplus, the client is already installed, so to use it you only need to do:

setupATLAS 
lsetup panda

Here we have set up the cvmfs software environment and asked to set up the Panda Clients.

To keep your grid work consolidated, create a new directory within tutorial called GridTutorial.

Run a ‘Hello World’ job with prun

From your GridTutorial directory, create a new directory for a simple prun test and navigate into the new directory:

mkdir prunTest
cd prunTest

tip This is important because prun sends (almost) all files in, and below, your current directory to the grid. Any unnecessary files in your current directory will also be sent to the grid and will slow down the job launch.

Now, create a python script called HelloWorld.py (using your favorite editor), that contains the following lines:

#!/usr/bin/env python3
print("Hello world!")

Make the python script executable with:

chmod u+x HelloWorld.py

Before launching it to the grid, it is important to run it locally. You should always do this to avoid wasting grid resources on jobs that crash locally.

./HelloWorld.py

Now we can submit the prun command to run this script on the grid:

prun --outDS user.<nickname>.pruntest --exec HelloWorld.py

Here <nickname> is your grid nickname/grid name (which is the same as your lxplus username).

This will queue two jobs, one build job that recreates your job environment and one corresponding to the actual Hello World job. The build job will execute first, and once it has finished the Hello World job will be executed. When the job has finished, we will try to find the “Hello world!” message in the output!

Monitor the job

If the job is successfully launched, you will see a confirmation printed to the screen along with a jediTaskID number that you will need in the next step.

To monitor the progress and check the log file output of a job, we can use the big panda monitor https://bigpanda.cern.ch. Scroll down to the field for Task ID and enter the number we noted above, and click on search.

This will send you to a page associated with this task, and shows that there are two jobs (in some stage of running). Have a look on this page to see the various pieces of information provided.

Now we will try to look at the output. We will need both jobs to have finished. If the jobs do not seem to have started running after a few minutes, it is suggested to carry on with the tutorial, and to check back frequently on the status of the jobs.

From the task page, click on the Show jobs drop-down menu:

PanDA Task Menu

This will give you several options of associated jobs to view. Click on All (including retries) to see a list of all jobs associated with the task:

PanDA Task Menu Jobs

If you want to get to the list of jobs directly, you can use the URL https://bigpanda.cern.ch/jobs/?jeditaskid=4203786 and modify the the jeditaskid to the value corresponding to yours.

From here, you can click on the PanDA ID number corresponding to a job. This will take you to a new page with details about that job. From this page, click on the Logs drop-down menu:

PanDA Job Menu

Now, click on Log Files to get a list of the log files associated with the job:

PanDA Job Menu Logs

The log containing the Hello World output is called payload.stdout. Click on it to open it and try to find the “Hello world!” message.

This forms the basis of simple debugging of jobs that fail on the Grid. As you will have tested the job locally first, if there is a problem, it may be a transient grid error, but it is useful to know how to search for problems in the output files. Note, it is also possible to download the log files as a dataset using dq2/Rucio.

Skip build stage

Now, because we did not need to compile any code to run this job - it’s just a simple python script - we do not actually need the build stage of the job.

To launch a prun job, without the build stage, type the following command:

prun --noBuild --outDS user.<nickname>.pruntest --exec HelloWorld.py

Now, only the script will be run. Usually though, you will probably want to compile some code first. You can read more about the PanDA tools.


⭐️ Bonus Exercise

So far, only the basics of the BigPanDA monitoring are described. As an optional exercise, see if you can find the link that shows all of your jobs. This is a good page to bookmark, to come back to in future.