[systemd-devel] systemctl hangs with 249.7 systemd in yocto Honister release

Heyi Guo guoheyi at linux.alibaba.com
Wed Jan 4 11:13:59 UTC 2023


Hi Michal,

Actually we have upgraded systemd version to 250.5, but the issue will 
still happen.

Navigating the journal log context of when the error message is first 
printed, I found there is a SEGV fault of systemd-udevd:

Jan 04 16:10:40 ali2600 systemd[1]: Created slice Slice 
/system/systemd-coredump.
Jan 04 16:10:40 ali2600 systemd[1]: Started Process Core Dump (PID 
7507/UID 0).
Jan 04 16:10:42 ali2600 systemd-coredump[7508]: elfutils disabled, 
parsing ELF objects not supported
Jan 04 16:10:42 ali2600 systemd-coredump[7508]: [LNK] Process 173 
(systemd-udevd) of user 0 dumped core.
Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Main process 
exited, code=dumped, status=11/SEGV
Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Killing 
process 7503 (systemd-udevd) with signal SIGKILL.
Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Killing 
process 7503 (systemd-udevd) with signal SIGKILL.
Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Failed with 
result 'core-dump'.
Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Scheduled 
restart job, restart counter is at 1.
Jan 04 16:10:42 ali2600 systemd[1]: Stopped Rule-based Manager for 
Device Events and Files.
Jan 04 16:10:42 ali2600 systemd[1]: Starting Rule-based Manager for 
Device Events and Files...
Jan 04 16:10:42 ali2600 systemd[1]: systemd-coredump at 0-7507-0.service: 
Deactivated successfully.
Jan 04 16:10:42 ali2600 systemd-udevd[7510]: corrupted size vs. prev_size

......

Jan 04 16:10:57 ali2600 systemd-coredump[7517]: elfutils disabled, 
parsing ELF objects not supported
Jan 04 16:10:57 ali2600 systemd-coredump[7517]: [LNK] Process 7516 
(systemd) of user 0 dumped core.
Jan 04 16:10:57 ali2600 phosphor-dump-manager[356]: *** stack smashing 
detected ***: terminated
Jan 04 16:10:57 ali2600 phosphor-dump-monitor[280]: Failed to create 
dump: sd_bus_call noreply: org.freedesktop.DBus.Error.NoReply: Remote 
peer disconnected
Jan 04 16:10:57 ali2600 systemd[1]: Caught <SEGV>, dumped core as pid 7516.
Jan 04 16:10:57 ali2600 systemd[1]: Freezing execution.
Jan 04 16:10:57 ali2600 phosphor-dump-manager[7536]: Failed to list 
units: Transport endpoint is not connected

Is it the reason for systemctl fails to work? For the log says "systemd 
freezing execution".

Thanks,

Heyi


On 2023/1/4 下午6:48, Michal Koutný wrote:
> On Wed, Jan 04, 2023 at 04:51:22PM +0800, Heyi Guo <guoheyi at linux.alibaba.com> wrote:
>> The issue happened again, but the /proc/1/stack and
>> /proc/$pid_of_dbus-broker/stack are both empty on our platform.
> (You reported previously the version was v249 (which is behind the last
> two upstream versions, so it may be a good idea to raise the issue with
> your distro.))
>
>> I checked kernel config and confirmed that  CONFIG_STACKTRACE is enabled:
>>
>> zcat /proc/config.gz | grep CONFIG_STACKTRACE
>> CONFIG_STACKTRACE_SUPPORT=y
>> # CONFIG_STACKTRACE_BUILD_ID is not set
>> CONFIG_STACKTRACE=y
>>
>> Is there any other config that is missing?
> I don't think so (the file wouldn't be present otherwise).
>
> If there are no kernel stacks, the tasks execute in userspace and given
> the indefinite stuckage, they're likely looping somewhere (or you must
> have been unlucky to miss a syscall), which should manifest in their CPU
> consumption.
>
> The userspace stack may be of interest then, e.g.
> `gdb -ex "bt" --batch -p 1`
>
> (for PID 1 and debuginfo for involved binaries must be present to obtain
> useful info).
>
> Michal


More information about the systemd-devel mailing list