PerfMon

Last update: 06 Aug 2024 [History] [Edit]

Introduction

PerfMonSvc is a service providing access to the custom build tools for Athena, it uses various Gaudi/Athena hooks to collect resource usage information and provides information on various levels such as;

  • Summary statistics: Maximum memory usage, throughput, etc.
  • Component statistics: CPU/wall-time and memory usage per component
  • Event statistics: Resource usage evolution within the event loop

The PerfMonMTSvc is an MT-save version of the PerfMonSvc allowing to collection of information for AthenaMT. It was started as part of the Google Summer of Code 2019 and was successfully integrated into Athena. After migration to the CA, the PerfMonSvc has been retired. To make daily monitoring easy to read the so-called PerfMonMT domains are introduced. During job configuration, the user can assign (a group of) algorithms to specific domains that are then used in the SPOT daily test when making resource usage plots. Since the SPOT components plots use only tenish maximum expensive algorithms per plot, the increase of resource usage in the specific domain helps shifters to identify further where the problem appears.

The usage of the PerfMonMT

For the transform jobs, the perfmon options are already integrated and can be easily switched on with the --perfmonmt= (options: 'none', 'fastmonmt' [default], 'fullmonmt') option, for example: Reco_tf.py --AMI q445 --perfmonmt fastmont. For the user-defined job option you can turn on the PerfMonMT by adding the following code to your CA configuration:

# Set the necessary flags
flags.PerfMon.doFastMonMT = True
flags.PerfMon.OutputJSON = 'perfmonmt_test.json'
# Merge the relevant service configuration
from PerfMonComps.PerfMonCompsConfig import PerfMonMTSvcCfg
acc.merge(PerfMonMTSvcCfg(flags))

The example above turned on the default fast monitoring options turning on the printout of the following non-detailed information:

  • Resource usage when event #N is in flight
  • Change in resource usage between certain points
  • Some summary statistics (throughput etc.)
  • Machine and environment information

The example of the log is below:

PerfMonMTSvc   INFO =======================================================================================
PerfMonMTSvc   INFO                                 Event Level Monitoring
PerfMonMTSvc   INFO         (Only the first 10 and the last measurements are explicitly printed)
PerfMonMTSvc   INFO =======================================================================================
PerfMonMTSvc   INFO Event           CPU [s]     Wall [s]    Vmem [kB]   Rss [kB]    Pss [kB]    Swap [kB]
PerfMonMTSvc   INFO ---------------------------------------------------------------------------------------
PerfMonMTSvc   INFO 1               274.73      283.32      5852376     4736336     4733328     0
PerfMonMTSvc   INFO 2               423.40      350.92      8640184     7461984     7458990     0
PerfMonMTSvc   INFO 25              655.36      380.95      10915780    9767124     9764130     0
PerfMonMTSvc   INFO 45              903.57      411.86      11189092    10069572    10066578    0
PerfMonMTSvc   INFO 68              1160.43     443.85      11237524    10109504    10106510    0
PerfMonMTSvc   INFO 92              1401.89     473.95      11243396    10121236    10118229    0
PerfMonMTSvc   INFO 111             1651.85     506.07      11257936    10158256    10155259    0
PerfMonMTSvc   INFO 133             1915.59     539.51      11275056    10175780    10172782    0
PerfMonMTSvc   INFO 153             2157.11     569.74      11431616    10327076    10324078    0
PerfMonMTSvc   INFO 177             2423.99     602.96      11447456    10333556    10330558    0                                         
...
PerfMonMTSvc   INFO 982             12893.39    1927.34     12208048    11083644    11080707    0
INFO =======================================================================================
PerfMonMTSvc   INFO                                 Snapshots Summary
PerfMonMTSvc   INFO =======================================================================================
PerfMonMTSvc   INFO Step         dCPU [s]    dWall [s]   <CPU>  dVmem [kB] dRss [kB]  dPss [kB]  dSwap [kB]
PerfMonMTSvc   INFO ---------------------------------------------------------------------------------------
PerfMonMTSvc   INFO Configure    119.33      123.739     0.96   1910384    1418476    1415640    0
PerfMonMTSvc   INFO Initialize   134.65      136.647     0.99   2485500    1916532    1916368    0
PerfMonMTSvc   INFO FirstEvent   163.66      83.894      1.95   3907400    3751616    3751632    0
PerfMonMTSvc   INFO Execute      12775.7     1620.08     7.89   3567864    3648092    3648147    0
PerfMonMTSvc   INFO Finalize     92.69       91.05       1.02   259976     -4910788   -4910442   0
PerfMonMTSvc   INFO ***************************************************************************************
PerfMonMTSvc   INFO Number of events processed:        1000
PerfMonMTSvc   INFO CPU usage per event [ms]:          12939
PerfMonMTSvc   INFO Events per second:                 0.587
PerfMonMTSvc   INFO CPU utilization efficiency [%]:    99
INFO***************************************************************************************
PerfMonMTSvc   INFO Max Vmem:                          11.64 GB
PerfMonMTSvc   INFO Max Rss:                           10.59 GB
PerfMonMTSvc   INFO Max Pss:                           10.59 GB
PerfMonMTSvc   INFO Max Swap:                          0.00 KB   
INFO ***************************************************************************************
PerfMonMTSvc   INFO Leak estimate per event Vmem:      1.08 MB
PerfMonMTSvc   INFO Leak estimate per event Pss:       1.05 MB
PerfMonMTSvc   INFO   >> Estimated using the last 37 measurements from the Event Level Monitoring
PerfMonMTSvc   INFO   >> Events prior to the first 300 are omitted
...
INFO =======================================================================================
PerfMonMTSvc   INFO                                   System Information
INFO =======================================================================================
PerfMonMTSvc   INFO CPU Model:                         AMD EPYC 7302 16-Core Processor 512 KB
PerfMonMTSvc   INFO Number of Available Cores:         32
PerfMonMTSvc   INFO Total Memory:                      251.67 GB
INFO =======================================================================================
PerfMonMTSvc   INFO                                Environment Information
INFO =======================================================================================
PerfMonMTSvc  INFO Malloc Library:                    libtcmalloc_minimal.so
PerfMonMTSvc  INFO Math Library:                      libimf.so

The more detailed information is available in the Fullmont. In addition to the fastmont information it contains the component-level metrics:

  • Initialize, FirstEvent, Execute, Finalize, Callbacks, preLoadProxy
PerfMonMTSvc     INFO                                  PerfMonMTSvc Report
PerfMonMTSvc     INFO 
PerfMonMTSvc     INFO                              Component Level Monitoring
PerfMonMTSvc     INFO =======================================================================================
PerfMonMTSvc     INFO Step           Count     CPU Time [ms]  Vmem [kB]      Malloc [kB]         Component
PerfMonMTSvc     INFO ---------------------------------------------------------------------------------------
PerfMonMTSvc     INFO Initialize     1         44376.84       0              153893              AthMonSeq_TrigJetMonitorAlgorithm
PerfMonMTSvc     INFO Initialize     1         9740.13        39200          86544               AthMonSeq_TrigEgammaAthMonitorCfg
PerfMonMTSvc     INFO Initialize     1         8956.97        17408          17920               AthMonSeq_JetMonitoring                                           
...
INFO=======================================================================================
PerfMonMTSvc     INFO FirstEvent     1         37515.43       0              0                   CondInputLoader
PfMonMTSvc       INFO FirstEvent     1         16442.71       0              0                   NswCalibDbAlg
PerfMonMTSvc     INFO FirstEvent     1         8017.96        0              0                   TrigDeserialiser
PerfMonMTSvc     INFO FirstEvent     1         4463.59        0              0                   MuonDetectorCondAlg
INFO =======================================================================================
PerfMonMTSvc     INFO Execute        999       1136392.37     0              0                   InDetSiSpTrackFinder
PerfMonMTSvc     INFO Execute        999       1034468.73     0              0                   InDetSiSpTrackFinderR3LargeD0
PerfMonMTSvc     INFO Execute        999       958990.57      0              0                   InDetAmbiguitySolver                                              
...
INFO=======================================================================================
PerfMonMTSvc    INFO Finalize       1         53148.94       0              -125681             ToolSvc
PerfMonMTSvc    INFO Finalize       1         23012.33       0              -3176               HLTDecodingSeq
PerfMonMTSvc    INFO Finalize       1         23012.14       0              -3176               TrigDeserialiser                                             
...
INFO=======================================================================================
PerfMonMTSvc   INFO preLoadProxy   1         15081.71       0              0                   loadCachesOverhead:COOLONL_TRIGGER/CONDBR2
PerfMonMTSvc   INFO preLoadProxy   1         12527.71       0              0                   DetCondKeyTrans[/LAR/Align]
PerfMonMTSvc   INFO preLoadProxy   1         12527.69       0              0                   UpdateAddr::/LAR/Align                                              
...
INFO=======================================================================================
PerfMonMTSvc  INFO Callback       1         57.50          0              0                   TRT_DetectorTool[0x3a859da8]+e9
PerfMonMTSvc  INFO Callback       1         42.41          0              0                   SCT_DetectorTool[0x40abd5b0]+e9
PerfMonMTSvc  INFO Callback       1         36.51          0              0                   PixelDetectorTool[0x40abe670]+e9
...

Note that component measurements are collected without locks (no synchronization) and that metrics are aggregated across all calls. That means that vmem/malloc metrics in the event loop are only available in serial/--threads=1 mode.

Analyse the perfmon output

As default the information is collected to the perfmonmt_jobname.json.tar.gz file which can be analyzed with scripts provided by PerfMonAna:

  • performmt-plotter.py : Creates event/component level plots
  • perfmonmt-printer.py : Prints untrimmed/ordered usage statistics
  • perfmonmt-refit.py : Re-fits memory usage in different event slices

Here is an example of snapshot metrics from the performmt-plotter.py:

Configuring Domains in Component Accumulator

The new algorithm can be added to the PerfMon domain in following example:

# Muon
acc.flagPerfmonDomain('Muon')
if flags.Detector.EnableMuon:
 from MuonConfig.MuonReconstructionConfig import MuonReconstructionCfg
 acc.merge(MuonReconstructionCfg(flags))
 log.info("---------- Configured muon tracking")
# EGamma
acc.flagPerfmonDomain('EGamma')
if flags.Reco.EnableEgamma:
 from egammaConfig.egammaSteeringConfig import EGammaSteeringCfg
 acc.merge(EGammaSteeringCfg(flags))
 log.info("---------- Configured e/gamma")
# Caching of CaloExtension for downstream
# Combined Performance algorithms.
acc.flagPerfmonDomain('CaloExtension')
if flags.Reco.EnableCaloExtension:
 from TrackToCalo.CaloExtensionBuilderAlgCfg import (
 CaloExtensionBuilderCfg)
 acc.merge(CaloExtensionBuilderCfg(flags))
 log.info("---------- Configured track calorimeter extension builder")
# Muon Combined
acc.flagPerfmonDomain('CombinedMuon')
if flags.Reco.EnableCombinedMuon:
 from MuonCombinedConfig.MuonCombinedReconstructionConfig import (
 MuonCombinedReconstructionCfg)
 acc.merge(MuonCombinedReconstructionCfg(flags))
 log.info("---------- Configured combined muon reconstruction")

It is listed in the log:

Py:ComponentAccumulator INFO :: This CA contains the following PerfMon domains ::
Py:ComponentAccumulator INFO :: There are a total of 626 registered algorithms in 22 domains ::