[systemd-devel] How to debug occasional hashmap corruption?

juice juice at swagman.org
Tue Nov 6 09:57:00 UTC 2018


Hi,

During the past half year I have seen systemd dump core three times due
to what I suspect a hashmap corruption or race.
Each time it looks a bit different and is triggered by different things
but it somehow centers on hashmap operations.

What would be the prefered way to debug this? I cannot add huge logging
as this is something that happens once in a blue moon and always in
different compute nodes.
Is there some way I could easily test it by increasing the chance of 
such
corruption/race happening?

The systemd version we currently have is 238 and the linux kernel is 
4.14.69

Here is the latest coredump trace of the problem.

[root at mn-0 robot]#
[root at mn-0 robot]# gdb /lib/systemd/systemd 
./core.systemd.0.6db5fc1f8d114369ae625ab03660c46e.1437.1539266612000000
GNU gdb (GDB) Fedora 8.1-14.wf29
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show 
copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.                        
                                                                          
                                        |
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /lib/systemd/systemd...Reading symbols from 
/usr/lib/debug/usr/lib/systemd/systemd-238-10.wf29.x86_64.debug...done.
done.
[New LWP 1437]
[New LWP 1]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/lib/systemd/systemd --switched-root --system 
--deserialize 16'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f13d7e2e1c7 in kill () from /lib64/libc.so.6
[Current thread is 1 (LWP 1437)]
Missing separate debuginfos, use: dnf debuginfo-install 
audit-libs-2.8.3-4.wf29.x86_64 glibc-2.27-15.wf29.x86_64 
kmod-libs-25-2.wf29.x86_64 libacl-2.2.52-20.wf29.x86_64 libattr-2.4.47-2
3.wf29.x86_64 libblkid-2.32-5.wf29.x86_64 libcap-2.25-10.wf29.x86_64 
libcap-ng-0.7.8-9.wf29.x86_64 libgcc-7.3.1-5.wf29.x86_64 
libmount-2.32-5.wf29.x86_64 libuuid-2.32-5.wf29.x86_64 lz4-
libs-1.8.1.2-4.wf29.x86_64 pam-1.3.0-13.wf29.x86_64 
xz-libs-5.2.3-6.wf29.x86_64 zlib-1.2.11-5.wf29.x86_64
(gdb)
(gdb) bt
#0  0x00007f13d7e2e1c7 in kill () from /lib64/libc.so.6
#1  0x00005607e9a93887 in crash (sig=6) at ../src/core/main.c:196
#2  <signal handler called>
#3  0x00007f13d7e2dedb in raise () from /lib64/libc.so.6
#4  0x00007f13d7e2f291 in abort () from /lib64/libc.so.6
#5  0x00007f13d937739a in log_assert_failed_realm 
(realm=realm at entry=LOG_REALM_SYSTEMD, text=text at entry=0x7f13d9423042 
"x",
     file=file at entry=0x7f13d9435e08 
"../src/libsystemd/sd-bus/bus-objects.c", line=line at entry=1618,
     func=func at entry=0x7f13d94366d0 <__PRETTY_FUNCTION__.13175> 
"vtable_member_compare_func") at ../src/basic/log.c:826
#6  0x00007f13d93bdca1 in vtable_member_compare_func (a=<optimized out>, 
b=<optimized out>) at ../src/libsystemd/sd-bus/bus-objects.c:1618
#7  0x00007f13d936d4bb in base_bucket_scan (h=h at entry=0x7f13d98f9740, 
idx=412, idx at entry=411, key=key at entry=0x5607eb85b700) at 
../src/basic/hashmap.c:1251
#8  0x00007f13d936ea11 in hashmap_put (h=0x7f13d98f9740, 
key=key at entry=0x5607eb85b700, value=value at entry=0x5607eb85b700) at 
../src/basic/hashmap.c:1268
#9  0x00007f13d93c0066 in add_object_vtable_internal 
(bus=bus at entry=0x5607eb7b2e00, slot=slot at entry=0x0, 
path=path at entry=0x5607e9b51fc8 "/org/freedesktop/systemd1/unit",
     interface=<optimized out>, interface at entry=0x7f13d9418b30 
"org.freedesktop.systemd1.Service", vtable=<optimized out>, 
fallback=fallback at entry=true,
     find=0x5607e9aa1080 <bus_unit_interface_find>, 
userdata=0x5607eb7273c0) at ../src/libsystemd/sd-bus/bus-objects.c:1739
#10 0x00007f13d93c0515 in sd_bus_add_fallback_vtable 
(bus=bus at entry=0x5607eb7b2e00, slot=slot at entry=0x0, 
prefix=prefix at entry=0x5607e9b51fc8 "/org/freedesktop/systemd1/unit",
     interface=interface at entry=0x7f13d9418b30 
"org.freedesktop.systemd1.Service", vtable=<optimized out>, 
find=find at entry=0x5607e9aa1080 <bus_unit_interface_find>,
     userdata=0x5607eb7273c0) at 
../src/libsystemd/sd-bus/bus-objects.c:1849
#11 0x00005607e9aa08be in bus_setup_api_vtables 
(m=m at entry=0x5607eb7273c0, bus=0x5607eb7b2e00) at ../src/core/dbus.c:573
#12 0x00005607e9aa1fc2 in bus_on_connection (s=<optimized out>, 
fd=<optimized out>, revents=<optimized out>, userdata=0x5607eb7273c0) at 
../src/core/dbus.c:711
#13 0x00007f13d93f35f8 in source_dispatch (s=s at entry=0x5607eb8162c0) at 
../src/libsystemd/sd-event/sd-event.c:2307
#14 0x00007f13d93f4dfa in sd_event_dispatch (e=e at entry=0x5607eb727910) 
at ../src/libsystemd/sd-event/sd-event.c:2667
#15 0x00007f13d93f4f89 in sd_event_run (e=0x5607eb727910, 
timeout=15000000) at ../src/libsystemd/sd-event/sd-event.c:2727
#16 0x00005607e9ad0df4 in manager_loop (m=0x5607eb7273c0) at 
../src/core/manager.c:2615
#17 0x00005607e9a8f5a5 in invoke_main_loop 
(ret_error_message=0x7ffe0ed707a8, ret_switch_root_init=<synthetic 
pointer>, ret_switch_root_dir=<synthetic pointer>,
     ret_fds=0x7ffe0ed707b8, ret_shutdown_verb=<synthetic pointer>, 
ret_retval=<synthetic pointer>, ret_reexecute=<synthetic pointer>, 
m=0x5607eb7273c0) at ../src/core/main.c:1651
#18 main (argc=5, argv=0x7ffe0ed70a98) at ../src/core/main.c:2430
(gdb)


-- 
    - Juice -


More information about the systemd-devel mailing list