In addition to what was shown in the previous section you can use also as a reference SPOT bi-weekly jobs.
The output from valgrind is (to say the least!) a bit cryptic. There are two types of output, “runtime messages” which appear as and when valgrind detects a problem, and “Final reports” … summary information at the end of the job (for example, on memory leaks, etc). In this section, they are both described.
As valgrind finds memory problems in running code, it outputs cryptic-looking messages. In this section, some examples are explained.
The message will be of the form:
==31929== Conditional jump or move depends on the uninitialized value(s)
==31929== at 0x807B0E4: PyObject_Free (in
/afs/cern.ch/sw/lcg/external/Python/2.3.4/rh73_gcc32/bin/python)
==31929== by 0x8074D4F: dictresize (in
/afs/cern.ch/sw/lcg/external/Python/2.3.4/rh73_gcc32/bin/python)
etc.
where the value inside the ==
is the pid
of your program. The rest of the message is the interesting part, and common examples are explained below.
Message | Explanation | Example of broken code |
---|---|---|
Conditional jump or move depends on uninitialised value(s) | You have something like if ( isTrue ) where _isTrue_ is undefined. This is very bad - it causes completely unpredictable behavior, and therefore nasty bugs. Please note that if valgrind says this, it really is true. It is not the case that valgrind hasn’t understood your code … valgrind does not parse code, it just watches memory |
int i; //i undefined if (i>2) doSomething(); |
Invalid read of size 8 | You have tried to read an invalid memory location i.e. which is not on the stack. See here for more details |
If you’re using memcheck
, then you will get lots of messages at the end, relating to memory leaks. The following information (more info at FAQ) might be useful:
The errors reported can be:
| Message | Explanation |
| ————— | ————————————————————————————————————————————————————————————————— |
| definitely lost | your program is leaking memory – fix it! |
| Possibly lost | your program is probably leaking memory unless you’re doing funny things with pointers |
| still reachable | means your program is probably ok
– it didn’t free some memory it could have. This is quite common and often reasonable. Don’t use --show-reachable=yes
if you don’t want to see these reports. |
| suppressed | a leak error has been suppressed. There are some suppressions in the default suppression files. You can ignore suppressed errors. |
(The details are in section 3.6 of the manual)
Running on release 10 I see the following, just before a crash (note that I have removed some of the backtrace output for clarity):
==29848== Invalid read of size 4
==29848== at 0x3C2C23B4: Trk::Surface::associatedDetectorElement() const (Surface.h:159)
==29848== by 0x3C288BD4: Trk::TrackStateOnSurface::~TrackStateOnSurface() (TrackStateOnSurface.cxx:84)
==29848== by 0x3C0BBD59: DataVector<Trk::TrackStateOnSurface const>::~DataVector() (in
/afs/cern.ch/user/e/emoyse/public/TestNightly/Tracking/TrkEvent/TrkTrack/
TrkTrack-04-18-03/i686-slc3-gcc323-opt/libTrkTrackDict.so)
==29848== by 0x3C2848C8: Trk::Track::~Track() (in /afs/cern.ch/user/e/emoyse/public/TestNightly/Tracking/TrkEvent/TrkTrack/
TrkTrack-04-18-03/i686-slc3-gcc323-opt/libTrkTrack.so)
==29848== by 0x3C0BB9A9: DataVector<Trk::Track>::~DataVector() (in /afs/cern.ch/user/e/emoyse/public/TestNightly/Tracking/TrkEvent/TrkTrack/
TrkTrack-04-18-03/i686-slc3-gcc323-opt/libTrkTrackDict.so)
==29848== by 0x3C5428B8: SG::DataBucket<DataVector<Trk::Track> >::~DataBucket() (in /afs/cern.ch/user/e/emoyse/public/TestNightly/Tracking/TrkEventCnv/
TrkEventAthenaPool/TrkEventAthenaPool-01-06-01/i686-slc3-gcc323-opt/
libTrkEventAthenaPoolPoolCnv.so)
Followed by:
==29848== Address 0x4BDF9960 is 16 bytes inside a block of size 28 freed
==29848== at 0x3414BC27: operator delete(void\*) (vg_replace_malloc.c:156)
==29848== by 0x379C6E06: Trk::PlaneSurface::~PlaneSurface() (in /afs/cern.ch/atlas/software/dist/10.0.0/Tracking/TrkDetDescr/TrkSurfaces/
TrkSurfaces-00-04-06/i686-slc3-gcc323-opt/libTrkSurfaces.so)
==29848== by 0x3C2C73C7: Trk::AtaPlane::~AtaPlane() (AtaPlane.cxx:117)
==29848== by 0x3C288BD4: Trk::TrackStateOnSurface::~TrackStateOnSurface() (TrackStateOnSurface.cxx:84)
==29848== by 0x3C0BBD59: DataVector<Trk::TrackStateOnSurface const>::~DataVector() (in /afs/cern.ch/user/e/emoyse/public/TestNightly/Tracking/TrkEvent/TrkTrack/
TrkTrack-04-18-03/i686-slc3-gcc323-opt/libTrkTrackDict.so)
What this means is the following:
TrackStateOnSurface
destructor is trying to use a pointer to an object that no longer exists because the object was already deleted in the PlaneSurface
destructor.addrcheck
)In this example, we will step through finding and solving a memory leak. I have run on opt, and got the following message in the job output:
==29199== 300 bytes in 15 blocks are definitely lost in loss record 1006 of 1301
==29199== at 0x3414B6D6: operator new(unsigned) (vg_replace_malloc.c:133)
==29199== by 0x3F79D4F0: Trk::TrackSummaryTool::createSummary(Trk::Track const&) (in /afs/cern.ch/atlas/software/
dist/nightlies/rel/atlrel_1/Tracking/TrkTools/TrkTrackSummaryTool/TrkTrackSummaryTool-00-11-01/i686-slc3-gcc323-
opt/libTrkTrackSummaryToolLib.so)
==29199== by 0x3F7BBDA0: InDet::PriVxTopAlg::m_preselect(Trk::Track const\*) (in /afs/cern.ch/atlas/software/dist
/nightlies/rel/atlrel_1/InnerDetector/InDetRecAlgs/InDetPriVxFinder/InDetPriVxFinder-01-00-01/i686-slc3-gcc323-
opt/libInDetPriVxFinder.so)
==29199== by 0x3F7C0780: InDet::VxPrimary::execute() (in /afs/cern.ch/atlas/software/dist/nightlies/rel/atlrel_1
/InnerDetector/InDetRecAlgs/InDetPriVxFinder/InDetPriVxFinder-01-00-01/i686-slc3-gcc323-opt/libInDetPriVxFinder.so)
==29199== by 0x342850DF: Algorithm::sysExecute() (in /afs/cern.ch/atlas/offline/external/Gaudi/0.14.6.14-root4/
GaudiKernel/v15r7p4/i686-slc3-gcc323-opt/libGaudiKernel.so)
==29199== by 0x341760B1: AthenaEventLoopMgr::executeAlgorithms() (in /afs/cern.ch/atlas/software/dist/nightlies
/rel/atlrel_1/Control/AthenaServices/AthenaServices-01-05-16/i686-slc3-gcc323-opt/libAthenaServices.so)
==29199== by 0x341764D5: AthenaEventLoopMgr::executeEvent(void\*) (in /afs/cern.ch/atlas/software/dist/nightlies
/rel/atlrel_1/Control/AthenaServices/AthenaServices-01-05-16/i686-slc3-gcc323-opt/libAthenaServices.so)
==29199== by 0x34176CC8: AthenaEventLoopMgr::nextEvent(int) (in /afs/cern.ch/atlas/software/dist/nightlies/rel
/atlrel_1/Control/AthenaServices/AthenaServices-01-05-16/i686-slc3-gcc323-opt/libAthenaServices.so)
This tells us that there is a 300 byte leak coming from Trk::TrackSummaryTool::createSummary(Trk::Track const&)
, which was called by InDet::PriVxTopAlg::m_preselect(Trk::Track const\*)
etc. (i.e. the last/latest call is at the top, and first at the bottom).
It might be useful to have some more detailed output - such as line numbers, for that last method. So we can check out and build just that package with debug info (see here for details on how to do this).
Now re-run … the new output is:
==8807== 20 bytes in 1 blocks are definitely lost in loss record 242 of 2341
==8807== at 0x3414C6D6: operator new(unsigned) (vg_replace_malloc.c:133)
==8807== by 0x3EB1894B: Trk::TrackSummaryTool::createSummary(Trk::Track const&) (TrackSummaryTool.cxx:171)
==8807== by 0x3F91A769: Trk::TrackScoringTool::score(Trk::Track const&) (in /afs/cern.ch/user/e/emoyse/
public/Development/Tracking/TrkTools/TrkAmbiguityProcessor/TrkAmbiguityProcessor-00-00-01/i686-slc3-gcc323-opt/
libTrkAmbiguityProcessorLib.so)
==8807== by 0x3F918E3F: Trk::TrackAmbiguityProcessorTool::process(DataVector<Trk::Track> const\*) (in
/afs/cern.ch/user/e/emoyse/public/Development/Tracking/TrkTools/TrkAmbiguityProcessor/
TrkAmbiguityProcessor-00-00-01/i686-slc3-gcc323-opt/libTrkAmbiguityProcessorLib.so)
==8807== by 0x3F92CF0E: InDetAmbiguitySolver::resolveTracks() (in /afs/cern.ch/user/e/emoyse/
public/Development/InnerDetector/InDetRecAlgs/InDetAmbiguitySolver/InDetAmbiguitySolver-00-08-00/
i686-slc3-gcc323-opt/libInDetAmbiguitySolver.so)
==8807== by 0x3F92CA48: InDetAmbiguitySolver::execute() (in /afs/cern.ch/user/e/emoyse/public/
Development/InnerDetector/InDetRecAlgs/InDetAmbiguitySolver/InDetAmbiguitySolver-00-08-00/
i686-slc3-gcc323-opt/libInDetAmbiguitySolver.so)
==8807== by 0x342860DF: Algorithm::sysExecute() (in /afs/cern.ch/atlas/offline/external/Gaudi/
0.14.6.14-root4/GaudiKernel/v15r7p4/i686-slc3-gcc323-opt/libGaudiKernel.so)
==8807== by 0x341770B1: AthenaEventLoopMgr::executeAlgorithms() (in /afs/cern.ch/atlas/
software/dist/nightlies/rel/atlrel_1/Control/AthenaServices/AthenaServices-01-05-16/i686-slc3-gcc323-opt/
libAthenaServices.so)
So now we see that there is a problem in TrackSummaryTool.cxx
, line 171… which narrows it down to here:
const std::vector< const Trk::TrackParameters* >* extrapParameters = 0;
if (m_doHoles)
{
extrapParameters = extrapolate(track);
[...]
}
The answer is now clear -extrapParameters
was never deleted. Fixing this and checking again with valgrind confirms that the bug is gone.
Especially in the tracking realm, many tools employ a cascaded object factory design pattern. That means they create an object and instead of deleting it before the method reaches the end, they give it back to the calling client, hereby passing the ownership (and responsibility to delete it or add it to StoreGate) onto the calling client, and even further on (cascade). The TrackSummaryTool in the example above is such a factory, and if it shows up in a valgrind leak report it needs some investigation to find out if the tool is ‘leaking’ objects internally or if the tool is working correctly but the calling client failed to take over the ownership. This investigation is complicated by the fact that often the factory tools have an internal structure with such ownership transfer, and the clients can themselves again employ a factory design to add the (potentially leaking) object onto the next bigger object. For instance, a track summary is first made (by factory, the TrackSummaryTool) and then added to a track, but the track is leaked. In such a case the developer of the TrackSummaryTool will not enjoy getting a bug report filed, for which the tool is not responsible.
To investigate such leaks, a few recommendations should be considered:
--num-callers=N
so that a common or suspicious caller does not get away unnoticed.Running Athena in 64bit platforms but with 32bit-built libraries you may hit errors like this:
memcheck
for platform ‘amd64-linux’: No such file or directory
This is solved by selecting the 64bit valgrind build: i.e.export PATH=/afs/cern.ch/sw/lcg/external/valgrind/3.5.0/x86_64-slc5-gcc43-opt/bin:$PATH
export LD_LIBRARY_PATH=/afs/cern.ch/sw/lcg/external/valgrind/3.5.0/x86_64-slc5-gcc43-opt/lib:$LD_LIBRARY_PATH
N.B. This should be fixed for all releases newer than 16.5.0.
valgrind: mmap(0x9065000, -1493585920) failed during startup.
valgrind: is there a hard virtual memory limit set?
This should not happen. Lxplus machines still have a hard memory limit, but it is so large (768 Mb at 06.12.2006, check with ulimit -a
) that it should be possible to run valgrind without a problem.