[systemd-devel] How to debug occasional hashmap corruption?
Vito Caputo
vcaputo at pengaru.com
Tue Nov 6 18:11:32 UTC 2018
On Tue, Nov 06, 2018 at 02:30:19PM +0200, juice wrote:
> Lennart Poettering kirjoitti 2018-11-06 12:27:
> > On Di, 06.11.18 11:57, juice (juice at swagman.org) wrote:
> >
> > >
> > > Hi,
> > >
> > > During the past half year I have seen systemd dump core three times
> > > due
> > > to what I suspect a hashmap corruption or race.
> > > Each time it looks a bit different and is triggered by different
> > > things
> > > but it somehow centers on hashmap operations.
> > >
> > > What would be the prefered way to debug this? I cannot add huge
> > > logging
> > > as this is something that happens once in a blue moon and always in
> > > different compute nodes.
> > > Is there some way I could easily test it by increasing the chance of
> > > such
> > > corruption/race happening?
> >
> > This looks very much like a memory corruption of some sorts and
> > valgrind should be the tool of choice to track that down.
> >
> > Lennart
>
> Thanks tor the prompt reply, Lennart.
>
> I agree; using valgrind indeed was something already considered, however I
> suspect it might add some overhead in systemd operation?
>
> The question here was more on the lines how to trigger the problem?
> It is quite rare as it seems the occurrance is about once per two months on
> our QL3 test pool which contains hunderds of VM guests...
> It would be impractical to build and deploy a release which contains systemd
> running under valgrind on every node! :)
>
In such scenarios where valgrind's overhead is impractical, I'd give
address sanitizer a try.
https://clang.llvm.org/docs/AddressSanitizer.html
Regards,
Vito Caputo
More information about the systemd-devel
mailing list