Introduction to Rucio

Last update: 23 Aug 2024 [History] [Edit]

Soon you will find yourself in the situation that the input/output sandboxes for regular grid jobs are not big enough anymore. You will need to start using distributed mass storage systems on the grid.

Rucio coordinates the ATLAS distributed data management systems and provides a convenient way to manage your files.

More details about Rucio can be found in the Rucio users guide.

The basic unit in Rucio is a data identifier (DID). A DID is nothing but a registered file, dataset (set of files), or container (set of datasets). These DIDs are stored at certain grid sites (like CERN, or BNL) and are registered in a central location (the DDM central catalogues). The logical mapping of files in a dataset to the physical location of the files on these grid sites is done by distributed file catalogues local to a certain site. (Multiple datasets can be aggregated into containers, but we will not cover this right now.)