[systemd-devel] How to debug occasional hashmap corruption?

Tue Nov 6 12:30:19 UTC 2018

Lennart Poettering kirjoitti 2018-11-06 12:27:
> On Di, 06.11.18 11:57, juice (juice at swagman.org) wrote:
> 
>> 
>> Hi,
>> 
>> During the past half year I have seen systemd dump core three times 
>> due
>> to what I suspect a hashmap corruption or race.
>> Each time it looks a bit different and is triggered by different 
>> things
>> but it somehow centers on hashmap operations.
>> 
>> What would be the prefered way to debug this? I cannot add huge 
>> logging
>> as this is something that happens once in a blue moon and always in
>> different compute nodes.
>> Is there some way I could easily test it by increasing the chance of 
>> such
>> corruption/race happening?
> 
> This looks very much like a memory corruption of some sorts and
> valgrind should be the tool of choice to track that down.
> 
> Lennart

Thanks tor the prompt reply, Lennart.

I agree; using valgrind indeed was something already considered, however 
I
suspect it might add some overhead in systemd operation?

The question here was more on the lines how to trigger the problem?
It is quite rare as it seems the occurrance is about once per two months 
on
our QL3 test pool which contains hunderds of VM guests...
It would be impractical to build and deploy a release which contains 
systemd
running under valgrind on every node! :)

-- 
    - Juice -