Soon you will find yourself in the situation that the input/output sandboxes for regular grid jobs are not big enough anymore. You will need to start using distributed mass storage systems on the grid.
Rucio coordinates the ATLAS distributed data management systems and provides a convenient way to manage your files.
More details about Rucio can be found in the Rucio users guide.
The basic unit in Rucio is a data identifier (DID
). A DID
is nothing
but a registered file, dataset (set of files), or container (set of
datasets). These DID
s are stored
at certain grid sites (like CERN, or BNL) and are registered in a
central location (the DDM
central catalogues). The logical mapping
of files in a dataset to the physical location of the files on these
grid sites is done by distributed file catalogues local to a certain
site. (Multiple datasets can be aggregated into containers, but we will
not cover this right now.)