Listing Datasets

Last update: 23 Aug 2024 [History] [Edit]

Now let’s try to list some interesting information. Remembering that all datasets (etc.) exist with a scope, we can try to list all the known items within your scope:

rucio list-dids "user.${USER}:*"

tip Depending on the type of shell you are using, the quotes may or may not be important.

If this is the first time you are using the grid, then this may well not show anything.

Let’s now try and find some data. We took data in 2022, and the data was (mostly) at 13.6TeV centre of mass energy. The data should then be in the scope data22_13p6TeV.

tip For MC datasets, the year is to do with the start of production in a configuration, which may not correspond to the year of the data it is modeling. For example, MC datasets modeling Run 2 are in the mc20_13TeV scope. For data, the year should match the datataking year except in rare cases. The way datasets are named are described in the nomenclature rules and here.

Exercise 1

Try to find the list of all DIDs for the data22_13p6TeV scope.

Solution

rucio list-dids "data22_13p6TeV:*"

Exercise 2

Now try to limit the returned set of items to those of type dataset and from run 450445 and of type AOD

Solution

Previously you used the wildcard * to search for all names in the scope. You can also search for patterns, e.g., *Main* Additionally, you can supply an extra --filter argument to the command to filter the results.

rucio list-dids "data23_13p6TeV:*450445*" --filter type=dataset,datatype=AOD

There may be a lot more, but here is a taste of what you might see…

+---------------------------------------------------------------------------------+-----------------+
| SCOPE:NAME                                                                      | [DID TYPE]      |
|---------------------------------------------------------------------------------+-----------------|
| data23_13p6TeV:data23_13p6TeV.00450445.physics_Main.merge.AOD.f1342_m2167       | DIDType.DATASET |
| data23_13p6TeV:data23_13p6TeV.00450445.physics_Main.merge.AOD.x731_m2165        | DIDType.DATASET |
| data23_13p6TeV:data23_13p6TeV.00450445.physics_MinBias.merge.AOD.f1342_m2167    | DIDType.DATASET |
| data23_13p6TeV:data23_13p6TeV.00450445.physics_ZeroBias.merge.AOD.f1340_m2165   | DIDType.DATASET |
| data23_13p6TeV:data23_13p6TeV.00450445.physics_CosmicCalo.merge.AOD.f1340_m2165 | DIDType.DATASET |
| data23_13p6TeV:data23_13p6TeV.00450445.express_express.merge.AOD.f1340_m2165    | DIDType.DATASET |
| data23_13p6TeV:data23_13p6TeV.00450445.express_express.merge.AOD.x731_m2165     | DIDType.DATASET |
| data23_13p6TeV:data23_13p6TeV.00450445.physics_CosmicCalo.merge.AOD.x731_m2165  | DIDType.DATASET |
+---------------------------------------------------------------------------------+-----------------+

tip Thanks to the strict nomenclature rules, you can also identify the type using the search string:

rucio list-dids "data23_13p6TeV:*450445*.AOD.*" --filter type=dataset

Now let’s enter the name of a non-existing dataset such as:

rucio list-dids "data23_13p6TeV:data23_13p6TeV.00450445.physics_Main.merge.AOD.f1342_m21670"

The output indicates that no such dataset exists:

+--------------+--------------+
| SCOPE:NAME   | [DID TYPE]   |
|--------------+--------------|
+--------------+--------------+

tip Rucio, like many ATLAS tools, has quite a bit of built-in help. You can try, for example:

rucio list-dids --help

in case you forget the filter format or what options are available.