Download Log File From the Grid

In the previous step, we saw how to find and open and view a log file in a web browser. But what if we wanted to download it? We can do this using Rucio tools.

After one of your jobs has completed we will now find and download the log file.

When using Rucio, it is almost always better to use it in a separate terminal to where you are running your code or submitting grid jobs. This will minimize the potential for conflicts between different python versions.

Setup the Rucio tools if you haven’t done so already.

lsetup rucio

If you are working on a new lxplus node, or any computer where you didn’t just submit your PanDA job, you may need to create a voms proxy:
voms-proxy-init -voms atlas:/atlas
Unlike pathena and prun, rucio won’t do that for you.

Go back to the BigPanDA web page and find the page with the jediTaskID that we used previously. Search for the Output entry in the Containers table, and note the log file container name, e.g., user.aparker.pruntest.log. Back in your terminal session, try to find this log file in the grid:

$ rucio list-dids user.aparker:*pruntest*log*
+--------------------------------------------------+--------------+
| SCOPE:NAME                                       | [DID TYPE]   |
|--------------------------------------------------+--------------|
| user.aparker:user.aparker.pruntest.log           | CONTAINER    |
| user.aparker:user.aparker.pruntest.log.340520924 | DATASET      |
+--------------------------------------------------+--------------+

We now have two options:

Download the container, and all log files within it (e.g. if the task contained many subjobs)
Download just the dataset specific to the single set of jobs

Let’s do the second:

rucio download user.aparker:user.aparker.pruntest.log.340520924

After it finishes downloading, navigate into the downloaded directory and extract the files from the tarball:

cd user.aparker.pruntest.log.340520924/
tar -xvf user.aparker.pruntest.log.23186476.000001.log.tgz

A tarball (a file with the tgz extension) is a set of files packaged together and compressed using gzip. This is an efficient way to transfer large numbers of small files.

This will give you access to the log file (as well as much more related information) from your job. This can be useful for debugging.

There will be a lot of information in here but when you have extracted the logs, the file you are probably looking for is payload.stdout

As you learned earlier, you can also directly download individual files with Rucio. If you go to the PanDA job page that you saw earlier, you will see in the table that says 3 job files: the log tarball. You can directly download that file with rucio:
rucio download user.aparker.pruntest.log.23186476.000001.log.tgz
Here rucio guessed the right scope to use thanks to the name of the tarball.