Component Accumulator

Last update: 06 Aug 2024 [History] [Edit]

Purpose of the Component Accumulator

The ComponentAccumulator is a container for storing configuration of components. It may contain configuration of multiple components that form a consistent set (i.e. an algorithm & tools/services it needs for execution). The job configuration is built by merging several ComponentAccumulators.

...
topCA.merge(alg1CA)
topCA.merge(alg2CA)
...

Deduplication

During the merging it may occur that ComponentAccumulators (here topCA and alg1CA) contain the configuration of the same component. The ComponentAccumulator has functionality to unify their settings. So you don’t need to worry about adding a service/tool twice. On the contrary, ComponentAccumulator instances are supposed to be as much as possible self-contained, so it is recommended to add services/tools needed for an algorithm to work to the ComponentAccumulator and not to rely on the user to add what is necessary.

The settings unification process is called deduplication and is applied to every component in the merged ComponentAccumulators. It works as follows:

  1. Components that have the same type but different name are left intact (i.e. both instances are kept) as this is regular case.
  2. Components that have the same type, name and all properties are silently ignored.
  3. Components that have the same type, the same name and differently set properties are subject to the unification process. For all differently set properties the unification is attempted. It relies on the semantics that can be defined for each configurable parameter separately (see more). By default, though, any differences are considered a mistake in the configuration and results in configuration failure.

Writing Configuration Methods

Methods that configure pieces of the job instantiate a ComponentAccumulator and add to it services, algorithms, etc. as needed. They can call other configuration methods (to obtain the configuration of components they depend on) and merge the result with their own ComponentAccumulator.

As parameters they always take a configuration flags container as the first argument and potentially other args and kwargs (positional and keyword arguments), as discussed the naming conventions mentioned earlier

A typical implementation looks like this:

from AthenaConfiguration.ComponentAccumulator import ComponentAccumulator
from AthenaConfiguration.ComponentFactory import CompFactory

def MyAlgoCfg(flags, name="MyAlgo", **kwargs):
    acc = ComponentAccumulator()

    # Call the config method of a service that we need:
    from ARequiredSvcPack.ARequiredSvcPackConfig import SuperServiceCfg

    # We get an accumulator containing possibly other components that our SuperService
    # depends upon and the service itself
    svcAcc = SuperServiceCfg(flags)

    # SuperService is the primary component that SuperServiceConfig configures
    # Get it, so we can attach it to the ServiceHandle of an algorithm
    kwargs.setdefault("svcHandle", svcAcc.getPrimary())

    # Merge its accumulator with our accumulator to absorb all required dependencies
    acc.merge(svcAcc)

    # NOTE: A shorthand for the above three lines is (more details below)
    # kwargs.setdefault("svcHandle", acc.getPrimaryAndMerge(SuperServiceCfg(flags)))

    # Set additional properties
    kwargs.setdefault("isData", not flags.Input.isMC)

    # instantiate algorithm configuration object setting its properties
    acc.addEventAlgo(CompFactory.MyAlgo(name, **kwargs))

    # Return our accumulator (containing SuperService and its dependencies)
    return acc

Please note that, as per the naming conventions, the function is named MyAlgoCfg after the main component it configures (the algorithm MyAlgo in this case).

The first argument passed is flags (as always), the second is a name (which can be defaulted and potentially overridden by the user) and, since this function configures one main component, we also pass kwargs that set the properties of this MyAlgo.

ComponentAccumulators with private tools

Private AlgTools are special because they can’t exist without a parent. There is no meaningful way for accumulating them elsewhere. However, configuration methods configuring a private AlgTool with all its dependencies yet without a parent is still a valid use-case. To allow this, the ComponentAccumulator class has methods setPrivateTools and popPrivateTools. A function configuring an AlgTool returns an instance of ComponentAccumulator that has the tool attached via setPrivateTool and contains all the auxiliary components (services, conditions Algorithm etc.) that the tool needs to work. The caller then obtains the private tool via the popPrivateTools method and assigns it to the PrivateToolHandle of the parent and merges the returned ComponentAccumulator with its own ComponentAccumulator. This works for a single AlgTool as well as for lists of AlgTools that are typically assigned to a PrivateToolHandleArray. Merging a ComponentAccumulator that has still private tools attached (e.g. popPrivateTools was never called) will raise an exception complaining about a dangling private tool.

The ComponentAccumulator provides a handy shortcut method popToolsAndMerge (and getPrimaryAndMerge, see the next section) that does aforementioned two operations at once. The example below illustrates how this works is as follows:

def PrivateToolCfg(flags, **kwargs):
    acc = ComponentAccumulator()
    ...
    # merge/add dependencies
    acc.addService(CompFactory.SomeService(...))
    kwargs.setdefault("PropertyA", 1.0)
    # it is recommended not to give a custom name to the private tool
    acc.setPrivateTools(CompFactory.ToolA(**kwargs))
    return acc

def AlgorithmCfg(flags, **kwargs):
    acc = ComponentAccumulator()
    tool = acc.popToolsAndMerge(PrivateToolCfg(flags))
    # or longer alternative
    toolAcc = PrivateToolCfg(flags)
    tool = toolAcc.popPrivateTools()
    acc.merge(toolAcc)
    kwargs.setdefault("Tool", tool)


    acc.addEventAlgo(CompFactory.Alg("MyAlg",
                                     **kwargs))
    return acc

Designating the primary components

When the ComponentAccumulator is a result of merging of several smaller components it may be useful to designate a component that is a primary concern for a configuration function. This way client code does not need to discover the component by name.

It may also be that the primary component may change depending on the flags. Also in this situation it is convenient to shield the client code by specifying the primary component. Such example is shown below:

def ToolCfg(flags):
    # Here we are not passing kwargs, since it is not obvious what the primary component will be
    acc = ComponentAccumulator()
    # this adds some public tools
    acc.merge(OtherToolsCfg(flags))
    # not a primary component
    acc.addPublicTool(CompFactory.ToolX("X", setting=...))

    # configure different tools depending on the flag value
    # here we will designate the primary component
    if flags.addA:
        acc.addPublicTool(CompFactory.ToolA("ToolA", settingX=...), primary=True)
    else:
        acc.addPublicTool(CompFactory.ToolA("ToolB", settingY=...), primary=True)
    return acc

def ConsumerCfg(flags, **kwargs):
    acc = ComponentAccumulator()

    # instead of code like this:
    #  toolAcc.getPublicTool("ToolA" if flags.flagA else "ToolB") we can do
    tool = acc.getPrimaryAndMerge(ToolCfg(flags))  # no need to agree on the name of the tool # configured by toolCfg
    # it possible to go the above in steps:
    #  tempAcc = ToolCfg(falgs)
    #  tool = tempAcc.getPrimary()
    #  acc.merge(tempAcc)
    # however the shortcut method getPrimaryAndMerge is provided for your convenience
    kwargs.setdefault("ToolX", tool)

    acc.addEventAlgo(CompFactor.MyAlg("MyAlg", **kwargs))
    return acc

Caching of configuration results

Configuration methods that are called many times may profit from caching their result:

from AthenaConfiguration.AccumulatorCache import AccumulatorCache

@AccumulatorCache
def MyAlgoCfg(flags, name="MyAlgo", **kwargs):
     ...

This will cache the result of MyAlgoCfg (similar to Python’s lru_cache) if the function is called with the same flags (and other parameters) multiple times. The decorator has a few (mostly experts) options that are documented in AccumulatorCache. It can also print the cache hit/miss statistics via:

from AthenaConfiguration.AccumulatorCache import AccumulatorDecorator
AccumulatorDecorator.printStats()

Warning Do not blindly apply this decorator. Only use it for methods that are known to be hot spots.

There is some more discussion about how to find hotspots which could benefit from caching in “Profiling and optimising configuration” later.

ComponentAccumulator API

The ComponentAccumulator has the following methods to add components to it:

  • merge(other, sequenceName=None)
    Merge in another instance of ComponentAccumulator. Deduplication is applied. All algorithms form the other top sequence are added to the destination sequence if the second argument is provided. Else the sequence structure is merged.
  • addSequence(sequence, parentName=None)
    Add a sequence, by default to the top-sequence of the accumulator. If second argument is provided the sequence is added as a subsequence of the sequence with that name. Handy methods to create various types of sequences (parallel/serial with AND/OR logic) are defined in CFElements.
  • addEventAlgo(algo,sequenceName=None,primary=False)
    Add one event-processing algorithm, by default to the top-sequence of the accumulator. If sequenceName argument is provided algorithm is added to this sequence.
  • addCondAlgo(algo,primary=False)
    Add one conditions-processing algorithm. Subject to deduplication.
  • addService(newSvc,primary=False,create=False)
    Add one service. Subject to deduplication. If create is set the service is added to the set of services forcibly created by Athena early in the job even without any client requiring it.
  • addPublicTool(tool,primary=False)
    Add one public tool. Subject to deduplication. Note: Public tools are deprecated for run 3. This feature will be removed.
  • setPrivateTools(tool or list of tools)
    Temporarily attach private AlgTool (or list of private AlgTools) to the accumulator. They need to be removed before merging.

For the explanation of the primary option see above.

Exceptions with ConfigurationError, DeduplicationFailure or plain TypeError are raised in case of misuse of these methods.

The ComponentAccumulator can be queried with these methods:

  • getEventAlgo(name)
    Get an event-processing algorithm by name.
  • getEventAlgos(seqName=None)
    Get all event algorithms (if sequence name is provided all algorithms in this and nested sequences).
  • getCondAlgo(name)
    Get a conditions processing algorithm by name.
  • getService(name)
    Get a service by name.
  • getPublicTool(name)
    Get a public tool by name Note: Public tools are deprecated for run 3. This feature will be removed.
  • getSequence(SequenceName=None)
    Returns a sequence (by searching the tree of sequences). By default returns the top sequence of the accumulator.
  • popPrivateTools()
    Returns the AlgTool or list of AlgTools previously attached to the accumulator.
  • getPrimary()
    Returns the component that is designated to be the primary one (see above for explanation).

Additional methods are available for the use in top level scripts for running the configuration contained in the ComponentAccumulator.

  • run(maxEvents=None,OutputLevel=INFO)
    That starts the athena execution.
  • store(outfile):
    Saves the configuration in the python pickle format to a file (The file needs to be open with open and closed after invocation of store).

Content of the configuration can be printed with: printConfig(withDetails=False, summariseProps=False, onlyComponents = [], printDefaults=False, printComponentsOnly=False) Various flags define level of details that will be emitted from this function. The meaning should be obvious.

Running configuration stored in ComponentAccumulator / self testing

The top level configuration file would need to:

  • setup flags,
    • import the flags (see code around initConfigFlags function below),
    • change their values as desired (typically set the input file),
    • possibly can add some new flags to make your job configurable from command line (see code around RunThis/RunThat below),
    • update flag values from command line (see fillFromArgs()),
    • lock the flags,
  • setup main services (see MainServicesCfg),
  • add the components you need,
  • run this configuration by calling acc.run(),
  • handle possible execution error (basically check the return StatusCode from the Athena).

A “Hello World” example can be found in Athena HelloAlg example config.

Coincidentally, in an identical way a self-test for configuration fragment can be setup. It is advised that it is part of each file defining the function that generates configuration. The only difference is that the setup has to be wrapped in the if __name__ == "__main__": clause that prevents it from being executed when the file is imported as a module. If the configuration contains algorithms a short test job can be run. Configurations without the algorithms can still be tested in this way to some extent.

# assume this is the content of MyAlgConfig.py
def MyAlgCfg(flags):
    acc = ComponentAccumulator()
    ...
    return acc

if __name__ == "__main__": # typically not needed in top level script
    # import the flags and set them
    from AthenaConfiguration.AllConfigFlags import initConfigFlags
    flags = initConfigFlags()
    # potentially add a flag that can be modified via command line
    # and that mak this script more universal
    flags.addFlag("RunThis", False)
    flags.addFlag("RunThat", False)
    ...
    flags.Exec.MaxEvents = 3
    ...
    # use one of the predefined files
    from AthenaConfiguration.TestDefaults import defaultTestFiles
    flags.Input.Files = defaultTestFiles.RAW
    flags.fillFromArgs() # make the job understand command line options
    # lock the flags
    flags.lock()

    # create basic infrastructure
    from AthenaConfiguration.MainServicesConfig import MainServicesCfg
    acc = MainServicesCfg(flags)

    # add the algorithm to the configuration
    acc.merge(MyAlgCfg(flags))

    # or make it conditional on your flags:
    if flags.RunThis:
        # ... add/do something
    if flags.RunThat:
        # ... add/doo something else
        
    # debug printout
    acc.printConfig(withDetails=True, summariseProps=True)

    # run the job
    status = acc.run()

    # report the execution status (0 ok, else error)
    import sys
    sys.exit(not status.isSuccess())

See the TileRawChannelMakerConfig example.

This script it then runnable with the command:

python -m PackageName.MyAlgConfig

If the test does not take long time (it should not) it can be added as a package unit test in CMakeLists.txt

atlas_add_test( MyAlgConfig
                 SCRIPT python -m PackageName.MyAlgConfig
                 POST_EXEC_SCRIPT noerror.sh)

# another variant of the test differing via flag
atlas_add_test( MyAlgConfigRunThat
                 SCRIPT python -m PackageName.MyAlgConfig RunThat=1
                 POST_EXEC_SCRIPT noerror.sh)

then these tests are always run as part of gitlab CI.

The functionality to interpret command line options as flags flags.fillFromArgs() is documented here (see more).

More examples of top level applications: Run3DQTestingDriver.py, RecoSteering.py.