tracking down reference counting memory leaks

Mon Oct 20 09:27:42 PDT 2014

as everyone knows, C++ programs do not leak memory because they use RAII
and smart pointers to automatically release all allocated memory at just
the right time.

... but enough with the jocularities: there was this ChartModel that was
leaked due to a uno::Reference cycle, which means that the usual tools
like valgrind and address sanitizer aren't very useful as such.

after an hour or so of adding just some simple SAL_DEBUGs it was obvious
that a temp file was leaked by CppunitTest_chart2_export because on the
2nd instance of ChartModel there were 1968 acquire() calls but only 1966
release() calls.

there are 2 ways i've tried to track down the 2 leaking acquire()s:

1. instrument the acquire()/release() method and run the test in gdb,
with breakpoint commands that print a backtrace on every
acquire()/release(), then run the resulting logfile through a little
python script that tries to find matching pairs of acquire()/release()

(this was suggested by mjayfancis and noel_grandin on IRC)

the script is now on master in bin/refcount_leak.py, at the top of the
file there are some comments on how to best use it.

the script just takes 2 minutes to run (90 seconds of which are spent in
a single regex) but unfortunately printing 4000 stack traces with gdb
takes > 3 hours on my laptop; probably that can be sped up by disabling
various pretty printing options or perhaps using some tracing tool other
than gdb.

the printed stack traces are sorted by "badness" but unfortunately it's
still necessary to inspect them manually; it's quite possible that the
leaking acquire() is erroneously matched with an unrelated release();
basically only matches with a "score" of 2 are reliable, which is just
~75% of them.  the actual leak i was looking for is printed as backtrace
#3 and #4, which isn't too bad, but a few tweaks were needed to get
there, and honestly i've used method #2 to first identify the leaking
trace :)

2. instrument the uno::Reference class so that every acquire()/release()
call is accompanied by a dummy memory allocation/release, so that
standard tools like valgrind/address sanitizer can detect the leaked
Reference

(this was suggested on the valgrind mailing list by Philippe Waroquiers
some years ago)

this was also a bit of work, because the uno::Reference has dozens of
weird constructors, set methods, SAL_NO_ACQUIRE, and worst of all, can
be created from uno::Any...

https://gerrit.libreoffice.org/#/c/12054/

... is the gerrit patch; it can detect only uno::Reference leaks, but
requires further work to detect rtl::Reference, uno::Any (and maybe
uno::Sequence and whatever other weird things?) too.

the obvious drawback is that effectively this requires a full rebuild
due to the changes in cppu headers.

there is also a bit of runtime overhead here so i'm not sure if it's a
good idea to turn it always on in --enable-dbgutil...

but the advantage is that searching for "hack_acquire" in valgrind's
output very quickly finds the actual leak.

***

overall i think that the second approach is probably better, since it
shouldn't need much manual interpretation of the results.