Callgrind is a tool that uses the runtime code instrumentation framework of Valgrind for call-graph generation. Valgrind is a kind of emulator or virtual machine. It uses JIT (just-in-time) compilation techniques to translate x86 instructions to a simpler form called code on which various tools can be executed. The code processed by the tools is then translated back to the x86 instructions and executed on the host CPU. This way even shared libraries and dynamically loaded plugins can be analyzed but this kind of approach results in a huge slowdown (about 50 times for the callgrind tool) of analyzed applications and big memory consumption.
valgrind --tool=callgrind --trace-children=yes $(which athena.py) your_Job.py
--tool
option determines which tool should be executed, callgrind in our case--trace-children
option tells Valgrind to analyze child processes of the main application, otherwise you’ll get only profiles of bash sessions which I think is not what you want wink--enable-debuginfod=no
i.e. to avoid a service providing debug information over an HTTP API.Profiling an entire Athena job has not only the disadvantage of being
very slow but also the resulting profiles can be huge (easily more than 100MB for a few events).
This can make it difficult to analyze the results using KCacheGrind
. Moreover, the
developer might only be interested in his/her algorithm. This is where the
ValgrindAuditor (part of Control/Valkyrie) can help out. Once configured
with an algorithm name to profile it will turn the callgrind instrumentation
on before the algorithm’s execute method and turn it off again afterwards. Once
the instrumentation is off the remaining valgrind overhead should only be
about a factor of 4 which makes it much easier to run an entire Athena job in a reasonable amount of time.
To enable the ValgrindAuditor add the following lines to your component accumulator job:
flags.PerfMon.Valgrind.ProfiledAlgs=["EMBremCollectionBuilder"] #EMBremCollectionBuilder is an example. Replace as appropriate
from Valkyrie.ValkyrieConfig import ValgrindServiceCfg
acc.merge(ValgrindServiceCfg(flags))
Set flags.PerfMon.Valgrind.ProfiledAlgs
to the name of the algorithm you want to
profile (you can of course add multiple algorithms). Sometimes you might want
to skip a few events before collecting profiling data, e.g. to exclude first-event initializations, ld
symbol lookups, etc. This
can be done by setting IgnoreFirstNEvents
. For a complete documentation of all
the ValgrindSvc
properties see the Valkyrie doxygen page.
Before you run your job in Valgrind you might simply want to run Athena with your modified job options. If everything works fine (and you haven’t made a mistake in the algorithm name) you should see the following output
ValgrindAuditor VERBOSE Starting callgrind: EMBremCollectionBuilder [event 1]
ValgrindAuditor VERBOSE Stopping callgrind: EMBremCollectionBuilder [event 1]
This tells you that ValgrindAuditor found your algorithm and is enabling/disabling
the callgrind instrumentation before/after execution (which won’t
have any effect since we are not running in Valgrind yet). Once we
have established that the auditor is configured correctly you can run your job in Valgrind
valgrind --tool=callgrind --trace-children=yes --collect-jumps=yes --instr-atstart=no --enable-debuginfod=no $(which athena.py) --imf your_Job.py
An example using Reco_tf
and some more callgrind options:
InputRDOFile="/cvmfs/atlas-nightlies.cern.ch/repo/data/data-art/CampaignInputs/mc20/RDO/mc20_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.recon.AOD.e6337_s3681_r13145/100events.RDO.pool.root"
valgrind --tool=callgrind --trace-children=yes --collect-jumps=yes --instr-atstart=no --cacheuse=yes --cache-sim=yes\
--simulate-wb=yes --simulate-hwpref=yes --branch-sim=yes --dump-instr=yes --enable-debuginfod=no $(which Reco_tf.py) \
--inputRDOFile $InputRDOFile \
--outputAODFile myAOD.pool.root \
--preInclude 'egammaConfig.egammaOnlyFromRawFlags.egammaOnlyFromRaw' \
--preExec 'ConfigFlags.PerfMon.Valgrind.ProfiledAlgs=["EMBremCollectionBuilder"]'\
--postInclude 'Valkyrie.ValkyrieConfig.ValgrindServiceCfg' \
--autoConfiguration 'everything' \
--maxEvents '20' \
--fileValidation FALSE \
--perfmon 'none'
The most important option is --instr-atstart=no
. This turns the
instrumentation off at the beginning so that it can be turned on by ValgrindAuditor
on the first event of the algorithm(s) being profiled. See
the Valgrind manual for other callgrind options. After
the job is done you will find several callgrind.out files from which the largest one is again the one you are interested in.
The above only works on Algorithms. To, for example, profile one method, you need to do a bit more:
CMakeLists.txt
file add: find_package( valgrind )
#include "valgrind/callgrind.h"
CALLGRIND_START_INSTRUMENTATION;
and at the end add CALLGRIND_STOP_INSTRUMENTATION;
After compiling and setting up your code, you can run callgrind as before
valgrind --tool=callgrind --trace-children=yes --collect-jumps=yes --instr-atstart=no --enable-debuginfod=no $(which athena.py) --imf your_Job.py
There’s more in the relevant section of the valgrind manual.
KCacheGrind
Data produced by callgrind can be loaded into the KCacheGrind
tool for browsing the performance results. The actual profile of Athena
run is stored in the biggest file produced by callgrind. On
lxplus KCacheGrind
is already installed so the user doesn’t need to do anything special:
$(which kcachegrind) --version
Qt: version
KDE Development Platform: version
`KCachegrind`: version `kde`