Validation - FTag Physval

Last update: 22 Mar 2022 [History] [Edit]

Table of contents
Introduction
Validation Rounds
Comments on reading the plots
How to produce plots
How to get a NTUP_PHYSVAL samples for testing
How to change the Physval code

Introduction

The intent of the Physval tool is to investigate impact of software changes between two releases. This is done by producing comparison plots (histograms & ROC curves) from REFERENCE and TEST samples and the primary interest is in the relative difference between releases (lesser interest in the absolute performance). The tool is used in official Validation Rounds and the results are presented at regular Physics Validation meetings. The plots are produced from the special NTUP_PHYSVAL format, which is derived from the DAOD_PHYSVAL format (or in older versions AODs). The validation rounds can run on both DATA and MC. In MC, ttbar and Z’ samples are used in FTag.

Validation Rounds

The working mailing list for the Physics and Software Validation is hn-atlas-physics-software-validation@cern.ch. E-mails announce when the next meeting will be (usually few days in advance) and will give you the corresponding tasks to perform.

Comments on reading the plots

The Physval code is run on DAOD_PHYSVAL derivations (and AODs in older versions). In both of these cases different variables are available, so different histograms are filled.

  • In AODs the available jet collection is AntiKt4EMTopoJets, in the DAOD_PHYSVAL derivation the AntiKt4EMPFlowJets collection is available.
  • In the DAOD_PHYSVAL derivations the MSV vertices and and the MV2c10 tagger variables are not available.
  • On ttbar samples the pT-related histograms with names pt_ttbar are filled and on Zprime samples the histograms with names pt_Zprime. The histograms have different ranges for pT and the respective other histograms are kept empty. The sample type is automatically determined by checking the sample DSID: ttbar has DSID 410000 and Zprime DSID 427080. (If none applies there is a warning and the ttbar-like cuts are applied.)

(The contents of the derivation could change, the last update to this list was in Aug 2021.)

How to produce plots

This section describes how to produce the comparison plots of a REF and TEST sample from the NTUP_PHYSVAL format. How to get a test sample is described in the ‘How to get samples for testing’ section below.

0) Required scripts
The processing requires some scripts which are available in athena at athena/PhysicsAnalysis/JetTagging/JetTagValidation/JetTagDQA/scripts/.
The folder contains:

  • mergePhysValFiles.py: a script for merging the NTUP_PHYSVAL samples, it also sorts the histograms into a folder structure
  • add_line_breaks_to_html.sh: a tiny bash script adding line breaks to the captions in the generated .html webpage
  • Draw_PhysVal_btagROC.c: a root script generating the ROC curves and efficiency vs. pT plots in a post-processing step
  • CreatePhysValWebPage.py: a small script generating a .html webpage for the ROC curves

1) Folder setup (this exact folder structure is not necessary but recommended)

starting with a blank shell
make a folder encoded with the date of the upcoming PHYSVAL meeting
cd validation
mkdir Rel22_<dd-mm-yyyy> ; cd Rel22_<dd-mm-yyyy>
make subfolders for the different tasks
mkdir task1_ttbar task2_ttbar ; cd task1_ttbar
and make some subfolders required for the next steps
mkdir ref test ROC files_merged
get the helper scripts that are located in athena:
PATHSCRIPTS="<path_to_athena>/athena/PhysicsAnalysis/JetTagging/JetTagValidation/JetTagDQA/scripts/"
cp $PATHSCRIPTS/mergePhysValFiles.py $PATHSCRIPTS/add_line_breaks_to_html.sh $PATHSCRIPTS/Draw_PhysVal_btagROC.c $PATHSCRIPTS/CreatePhysValWebPage.py .

2) Downloading the samples
Get the sample names from the prodtask page that was sent in the email / is linked on the task jira. The right sample is the one that has the right DSID (410000 for ttbar and 427080 for Zprime) and the right tags as mentioned in the email / jira.

setup atlas and init rucio (tutorial grid certificate)
setupATLAS
lsetup Rucio
voms-proxy-init -voms atlas
go to the ref folder and download the REFERENCE sample, then go to the test folder and download the TEST sample
rucio list-files <NAME REFERENCE> (for checking the sample)
cd ref ; rucio download <NAME REFERENCE>
cd ../test ; rucio download <NAME TEST> ; cd ..
(one can also do rucio download --nrandom 1 <sample> to download only one random file for testing)

3) Merge the samples

run the merging script (if any hickups occur in this step try starting from a new shell)
REFFILE="./ref/*" (path to ref files dir)
TESTFILE="./test/*" (path to test files dir)
python mergePhysValFiles.py -i $REFFILE -o files_merged/merged_NTUP_PHYSVAL_ref.root -d BTag
python mergePhysValFiles.py -i $TESTFILE -o files_merged/merged_NTUP_PHYSVAL_test.root -d BTag

4) Run the makewebdisplay script

setup Athena (if any hickups occur in this step try starting from a new shell)
setupATLAS
asetup 22.0.32
MERGED_REFFILE="files_merged/merged_NTUP_PHYSVAL_ref.root"
MERGED_TESTFILE="files_merged/merged_NTUP_PHYSVAL_test.root"
REF_NAME="ref" (this sets the legend of the plot, choose a fitting name here)
TEST_NAME="test" (legend for test)
physval_make_web_display.py --reffile $REF_NAME:$MERGED_REFFILE --outdir=plots $MERGED_TESTFILE --title=$TEST_NAME --logy --normalize --ratio --startpath="BTag"
run the script to add line breaks to the html output (ensure the first line points to the plots directory)
bash add_line_breaks_to_html.sh

5) Create the ROC curves
edit the Draw_PhysVal_btagROC.c script:

  • ensure that the corresponding lines towards the end of the script read the merged files:
    TString reffile = "files_merged/merged_NTUP_PHYSVAL_ref.root";
    TString testfile = "files_merged/merged_NTUP_PHYSVAL_test.root";
    vector<TString> InputFilesNames = {reffile, testfile};
  • ensure the TString MC is set to "ttbar" or "Zprime", for the respective sample
  • ensure the vector<TString> leg_entry has meaningful names for the ref and test legends
  • if the NTUP_PHYSVAL sample was generated via the DAOD_PHYSVAL step (it probably is if the sample name has three p-tags and not just two), the script has to be changed to run on PFlow jets, and not EMTopo jets. For that just replace every occurrence of AntiKt4EMTopoJets in hte script by AntiKt4EMPFlowJets.
  • also ensure that the ROC/ folder exists in the current dir (root will complain otherwise)

then run the script
root -l -b -q Draw_PhysVal_btagROC.c
and run the script to create the html page
python CreatePhysValWebPage.py -i ROC/

6) Move the output to the validation plots webpage

mkdir /afs/cern.ch/atlas/groups/validation/Btagging/Rel22_<dd-mm-yyyy>/task1_ttbar
cp -r plots/ /afs/cern.ch/atlas/groups/validation/Btagging/Rel22_<dd-mm-yyyy>/task1_ttbar/.
cp -r ROC/ /afs/cern.ch/atlas/groups/validation/Btagging/Rel22_<dd-mm-yyyy>/task1_ttbar/.
the webpage link is: http://atlas-computing.web.cern.ch/atlas-computing/links/PhysValDir/Btagging/

and then do the same for the other tasks :)

When encountering errors in the downloading or processing it can help to re-try with a clean shell / fresh setup.

How to get a NTUP_PHYSVAL samples for testing

The samples to use for Physval rounds will be announced in the email / jira ticket. First check in the https://prodtask-dev.cern.ch/… link if the status = done.
To list the sample one can run such a command after the rucio setup below:

rucio list-dids valid<1, 2, or 3>:*ttbar*NTUP*<tag>*

or look directly on panda, the “valid” version will be specified in the email.

If you just want to test the plotting, you can download a sample of a recent Physval round.
The jira tickets for the validation rounds (look here) contain links to the dsids of the NTUP_PHYSVAL. Follow the prodtask-dev.cern.ch link and choose a ttbar sample, and download the samples as described above.

If you want to generate your own NTUP_PHYSVAL file from an DAOD or AOD instead, follow these steps. It is assumed you have the dsid for an AOD sample.

setup folder structure
mkdir AODtoNTUP && cd AODtoNTUP
setup
setupATLAS
lsetup "root 6.18.04-x86_64-centos7-gcc8-opt" (or a more recent root version)
download the sample
lsetup rucio
voms-proxy-init -voms atlas
PATHTOSTORAGE=<a path to some storage space>
cd $PATHTOSTORAGE
rucio download <sample dsid> (optional: use --nrandom 1)
cd -
setup Athena
asetup Athena,22.0.11 (or a more recent version)
run Reco_tf.py to create the NTUP_PHYSVAL
INPUTAOD=$PATHTOSTORAGE/<sample dsid>/<file name> (choose one of the downloaded files for the file name. When running on more than one file just give multiple file paths, separated by a space)
Reco_tf.py --maxEvents 100 --inputAODFile $INPUTAOD --outputNTUP_PHYSVALFile NTUP_PHYSVAL.root --validationFlags noExample doBtag (adapt the max number of events)

How to change the Physval code

This section describes how to change the code that produces the NTUP_PHYSVAL file.
The first part is the general setup.

starting with a blank shell
setup mkdir validation && cd validation
setupATLAS
lsetup git
use git atlas init-workdir to make a sparse checkout of athena
git atlas init-workdir https://:@gitlab.cern.ch:8443/atlas/athena.git
cd athena
add the JetTagDQA package
git atlas addpkg JetTagDQA
update upstream and make a new branch (this might build on you having a private fork of athena)
git checkout -b master-<name for the branch> upstream/master --no-track
<editing code comes here>

For changing the code these are the two files that you want to modify (plus headers):
athena/PhysicsAnalysis/JetTagging/JetTagValidation/JetTagDQA/src/BTaggingValidationPlots.cxx
athena/PhysicsAnalysis/JetTagging/JetTagValidation/JetTagDQA/src/PhysValBTag.cxx

For building a running the changed code follow this section.

starting with a blank shell
cd validation
mkdir build run && cd build
setup
setupATLAS
asetup Athena,master,latest
build
cmake ../athena/Projects/WorkDir/
make
source x*/setup.sh
run
cd ../run
INPUTAOD=<path>/<sample dsid>/<file name>
Reco_tf.py  --inputAODFile  $INPUT --outputNTUP_PHYSVALFile NTUP_PHYSVAL.root --maxEvents 100  --validationFlags noExample doBtag
(adapt the max number of events. optional: add a > log.txt to the command)

To make plots from the new NTUP_PHYSVAL file follow the procedure described above (How to produce plots).