The purpose of the EventLoop package is to relieve the user of the
burden of writing his own event loop. For the user this has a number of
advantages:
- The steering code for running on a local batch cluster or the grid
can be technically complicated, particularly when creating and
managing output datasets. Furthermore this code has to be updated
somewhat regularly, to keep up with changes to the grid
infrastructure, etc. The EventLoop package shifts that burden from
the individual analyzer to the EventLoop package maintainer.
- Since the same package provides the interfaces for different
architectures it is fairly straightforward to change from running on
your local machine, to your local batch system or the grid. For the
analyzer that means that if his analysis evolves and he needs more
computing power it is very little effort to just move his jobs to a
new site.
- When running on the grid, individual job failures are relatively
common. The grid driver in the EventLoop package is developed and
maintained by the grid community with the express purpose of
allowing automatic recovery from common job failure modes. For the
analyzer that means he has to spend less time retrying failed jobs
manually.
The main drawback of using the EventLoop package is that you have to
restructure your code a little bit. How much you have to do, depends on
how your code looks like now. The main points are:
- Since you no longer run your own event loop, you have to encapsulate
your code inside a class that can then be called from the EventLoop
package. If you have been using TTree::MakeClass this may be a new
concept, if you have been using TTree::MakeSelector or another event
loop package, then this concept will already be familiar to you.
- The build is being handled by cmake, meaning your analysis code
has to be inside one (or several) cmake packages. This is
necessary, because cmake is the only build system we
have that builds out of the box at all the different sites that your
code may run on.
- The samples are provided through the SampleHandler package. How much
use you make of the SampleHandler package is up to you. You can use
it to also manage your samples, or simply create the SampleHandler
objects when you initialize the event loop. SampleHandler was chosen
to describe the samples, because we had to pick a way and
SampleHandler is both: generic and an official PAT package.
- The submission has to happen through a root/PyROOT script. This has to
happen, because you are supposed to configure your objects before
handing them over to EventLoop. While this may be a change for some
people it was deemed the easiest solution. The alternative would
have been to pass in a configuration file that your objects can
read, but that would have created extra work for many users who are
not currently set up to read configuration files.
There are a number of solutions that provide similar services to
EventLoop. On the one hand there is athena, on the other hand there are
a number of user packages doing these things. While
they have a lot in common, they also differ in certain points. EventLoop
has the following features, which at least in this combination are not
available in other products:
- EventLoop is an official ATLAS project. This means that future support
and maintenance rests on a much better base than for projects
that people maintain in private. It also means that the EventLoop
package is likely to be integrated better with other ATLAS packages.
- EventLoop supports running on the local machine, batch systems and the
grid (Kubernetes support to come). Most other solutions only work
on a subset of these. It should be noted though that most users only
use a subset, so they may not care.
- EventLoop is a very focused package. All it does is running the
event loop, and doesn’t try to provide a solution for configuring
jobs or reading event data into memory. This gives the user the
freedom to choose how he wants to do those tasks independently from
choosing how to run the event loop.
- EventLoop is designed to be extendable to new environments.
Everything that is specific to a certain architecture is factored
out into two or three classes. If you want to add another one, you
only have to add these classes for your specific case.
- EventLoop is fairly lightweight and flexible. It should be
comparatively easy to take existing code and switch it over to using
EventLoop. This should be true both for end user code and other
frameworks that want to offload the burden of running the event
loop.
- EventLoop doesn’t require the user to copy any source files into
his own area and modify them. While this may not seem like much,
there is an inherent work load once the user takes over control of
files. Essentially changes no longer get propagated to him
automatically and instead he has to integrate them manually.