[systemd-devel] How to debug occasional hashmap corruption?

Tue Nov 6 16:42:23 UTC 2018

On 11/6/18 9:57 UTC, juice wrote:

> During the past half year I have seen systemd dump core three times due
> to what I suspect a hashmap corruption or race.
> Each time it looks a bit different and is triggered by different things
> but it somehow centers on hashmap operations.

Three intermittent hardware failures in one year on 10,000 boxes is normal.
Keep good records.  If the same box appears twice, then physically destroy it.

Meanwhile, log all events to a circular buffer that just keeps rotating:
date+time (32 bits, 1 microsecond precision), caller (return address),
argument summary (fixed format: string prefixes or hash).  Analyze the dump.

Lock each hashmap operation to insure single-threaded operation,t;
prevent even multiple [supposedly] read-only access.
Lock each signal handler: only one instance of a given signal at a time.