WARNING: This section may no longer be up-to-date
This section should not be relevant to the typical user. It documents the details of how to implement a new driver to make the EventLoop package work in a new environment. If this is what you are trying to do, it may be a good idea to contact me up-front with the details of what you are trying to do, so I can give you some additional guidance.
The first decision you have to make is whether your driver should be a part of the EventLoop package or live in a separate package. That’s really up to you, but so far I am keeping everything in one package for simplicity. However, even if you keep your driver in a separate package, there are probably some changes that need to be made to the EventLoop package anyway.
The basic driver design will consist of three or four components:
This part of the EventLoop design is still very fluid. As we add more drivers some of the interfaces may change to accommodate their needs. That is one of the reasons why it is better if I know which drivers are out there, so that I can go and fix them if I break things. Anyway, this also means that you can request changes to the way EventLoop works behind the scenes to make your driver implementation easier.
When designing your driver, you have the choice of storing additional information inside the unique submission directory, as long as it doesn’t collide with any “official” files put there. If your files are fairly large you may consider removing them after the job has finished in order to save space.
The Driver class provides an interface for code that runs on the
submission node. As such your class needs to derive from that class and
override its virtual functions. So far the only virtual function is
doSubmit
, which is called when submitting a new job.
Depending on the nature of your driver, you may also want to add further
configuration options. These can go either into your Driver class
itself, or into the driver-independent Job class. Which of the two is
preferable mostly depends on whether this is something you expect the
user to set on a job-by-job basis, or something that they would want to
keep the same for all their jobs. A combination is also possible, with a
field in the Driver class that can be overridden by a field in the Job
class. Configuration options that affect output datasets have the
additional option of going into OutputStream
.
Notes:
doSubmit
method will be split into a doSubmit
and doGather
method, which allows drivers to disconnect from a
running job and then reconnect at a later stage.doSubmit
FunctionThe basic functionality of the doSubmit
method can be summarized like
this:
Job
object that will be needed
on the worker node. This is mostly the list of algorithms, but also
the actual samples being run over (for meta-data access), and
potentially the list of output datasets.location/hist-sample.root
.
The root tool hadd
can be used for merging.location/out-label
.When writing your driver you will have to interact fairly heavily with
the SampleHandler package. This package is still fairly new and not much
utilized, which means that we can still fix things that seem broken or
impractical. As a first thing you have to decide how your samples will
be represented. If what you need is a list of files, call makeTChain
or makeTDSet
to get the list. If on the other hand your system is
aware of datasets, you may want to use SampleGrid
objects and store
the information in the meta-data. You may have to define the appropriate
meta-data fields if they don’t exist already.
For your output datasets the preferred method is not to copy them back
to the submission node, but directly to a storage element (see reasons
in the section on output datasets above). For you that means that you
should try to figure out how to do that. Once you have done that, you
need to figure out how to access those files and create a new Sample
object to do so. In most cases this will be a SampleLocal
or a
SampleGrid
object, but if your storage element is sufficiently special
you may need a whole new Sample
class. If that is the case, I can help
with that.
Now your output histograms have to go a separate way from your output
datasets. How this works depends on your batch system. Most batch
systems send some information back to the submission node, so you can
just include it there. Once all of them have arrived, you can use hadd
to combine the output histogram files into a single one. Or if you want,
you can also try to add together histogram files as they arrive. The
later saves some time when running with a large number of sub-jobs.
Notes:
SampleComposite
that needs to be supported,
but is not supported right now. From a practical perspective a
SampleComposite
holds an entire SampleHandler that you then need
to run over and combine. Not too much changes for you, except that
you may have to combine histogram and output files over multiple
datasets. I hope to address this issue soon.The worker class contains the code that actually runs on the worker
node. As such, it is both in control of running the job as well as
providing all the hooks the user algorithms need to access their inputs
and outputs. To facilitate that, the Worker
base class contains a fair
amount of functionality itself and does some translating between the
algorithms and the implementation of the derived classes.
When initializing the Worker
object you have to do a couple of things:
Then when actually running you have to do a couple of things per event, and in this order:
Worker::tree(tree)
with the new
input tree.Worker::treeEntry(entry)
with the index of the tree entry
currently processed.Worker::algsChangeInput()
, which
will notify the algorithms that a new input file is available. It is
important that this happens after you register both the tree and the
next entry to process.Worker::algsExecute
to tell algorithms to do the actual
processing of the event.After a Worker
object has finished its processing events, it needs to
do a couple more things:
Worker::algsFinalize
to tell all algorithms that
they are finished processing and need to perform any final work
left.Notes:
Driver::saveOutput
.
This is a public static function, so it can be called by either the
Driver or the Worker.algsChangeInput
into algsExecute
,
simplifying the process by one step.What you need for your steering code will be highly system dependent.
You probably need a shell script that runs on the worker and a binary
that creates your Worker
object. For creating that binary you can just
add another if clause to the util/event_loop_worker.cxx
source file.
Please don’t add a lot of code to that file, just call a function that
does everything for your particular driver. Or you can add a completely
new binary if you choose. I prefer not to do that, since I don’t want
to have a large number of binaries sitting in my path.
I’m still working out, how best to do the unit test. For now take a
look at test/ut_driver_direct.cxx
, which shows how I do it now.
However, I am not really happy with it, so it is probably going to
change.