Using Batch Systems

Last update: 19 May 2023 [History] [Edit]

There is a modest CERN-based batch system that anyone on ATLAS can use. However, its resources are limited, and sometimes it can take quite a while to get batch jobs to turn around. In some cases, because of demand spikes, it may even be faster to submit to the grid! CERN maintains some documentation of the CERN batch system.

Users of AthAnalysis or any other Athena project can learn how to submit jobs to the batch systems at this link

Generally the procedure is to replace the driver you have been using so far with a driver specific to your batch system. There are a fair number of different drivers available, and adding another one is often just a matter of a few hours, even if you are not an experienced developer. However, don’t be afraid to ask for help if you feel a driver for your system is missing. The list of supported batch drivers can be found here: Event Loop: Batch System Drivers

A sample setup of a batch driver may look like this:

  EL::LSFDriver driver;
  driver.options()->setString (EL::Job::optSubmitFlags, "-L /bin/bash"); // or whatever shell you are using
  driver.shellInit = "export ATLAS_LOCAL_ROOT_BASE=/cvmfs/atlas.cern.ch/repo/ATLASLocalRootBase && source ${ATLAS_LOCAL_ROOT_BASE}/user/atlasLocalSetup.sh";

Note the shellInit parameter, which is used to set up AtlasLocalSetup on each of the worker nodes. You may have to adjust this to whatever you need to set up atlas software on your machines.

Note that the software generally assumes that you have a shared filesystem that can be used to share your software between the machines and to return the output files. For condor batch systems there are options to submit jobs without a shared filesystem. There are plans to switch batch submission to use docker, which would hopefully make it easier to support operation without a shared filesystem with other batch systems.

There is also a special driver called LocalDriver which simulates submitting to a batch system on a single machine. If anything goes wrong, running with LocalDriver is the first thing you should try. This is also the only batch driver that is fully supported centrally, all other batch drivers are only supported on a best effort basis (as full support for a diverse set of batch systems would be very difficult).

When running on a batch system, change the submit() command in your steering macro to submitOnly(). If you use submit(), the script will wait until the jobs are finished, which we don’t want.