[systemd-devel] systemctl hangs with 249.7 systemd in yocto Honister release
Heyi Guo
guoheyi at linux.alibaba.com
Wed Jan 4 11:13:59 UTC 2023
Hi Michal,
Actually we have upgraded systemd version to 250.5, but the issue will
still happen.
Navigating the journal log context of when the error message is first
printed, I found there is a SEGV fault of systemd-udevd:
Jan 04 16:10:40 ali2600 systemd[1]: Created slice Slice
/system/systemd-coredump.
Jan 04 16:10:40 ali2600 systemd[1]: Started Process Core Dump (PID
7507/UID 0).
Jan 04 16:10:42 ali2600 systemd-coredump[7508]: elfutils disabled,
parsing ELF objects not supported
Jan 04 16:10:42 ali2600 systemd-coredump[7508]: [LNK] Process 173
(systemd-udevd) of user 0 dumped core.
Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Main process
exited, code=dumped, status=11/SEGV
Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Killing
process 7503 (systemd-udevd) with signal SIGKILL.
Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Killing
process 7503 (systemd-udevd) with signal SIGKILL.
Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Failed with
result 'core-dump'.
Jan 04 16:10:42 ali2600 systemd[1]: systemd-udevd.service: Scheduled
restart job, restart counter is at 1.
Jan 04 16:10:42 ali2600 systemd[1]: Stopped Rule-based Manager for
Device Events and Files.
Jan 04 16:10:42 ali2600 systemd[1]: Starting Rule-based Manager for
Device Events and Files...
Jan 04 16:10:42 ali2600 systemd[1]: systemd-coredump at 0-7507-0.service:
Deactivated successfully.
Jan 04 16:10:42 ali2600 systemd-udevd[7510]: corrupted size vs. prev_size
......
Jan 04 16:10:57 ali2600 systemd-coredump[7517]: elfutils disabled,
parsing ELF objects not supported
Jan 04 16:10:57 ali2600 systemd-coredump[7517]: [LNK] Process 7516
(systemd) of user 0 dumped core.
Jan 04 16:10:57 ali2600 phosphor-dump-manager[356]: *** stack smashing
detected ***: terminated
Jan 04 16:10:57 ali2600 phosphor-dump-monitor[280]: Failed to create
dump: sd_bus_call noreply: org.freedesktop.DBus.Error.NoReply: Remote
peer disconnected
Jan 04 16:10:57 ali2600 systemd[1]: Caught <SEGV>, dumped core as pid 7516.
Jan 04 16:10:57 ali2600 systemd[1]: Freezing execution.
Jan 04 16:10:57 ali2600 phosphor-dump-manager[7536]: Failed to list
units: Transport endpoint is not connected
Is it the reason for systemctl fails to work? For the log says "systemd
freezing execution".
Thanks,
Heyi
On 2023/1/4 下午6:48, Michal Koutný wrote:
> On Wed, Jan 04, 2023 at 04:51:22PM +0800, Heyi Guo <guoheyi at linux.alibaba.com> wrote:
>> The issue happened again, but the /proc/1/stack and
>> /proc/$pid_of_dbus-broker/stack are both empty on our platform.
> (You reported previously the version was v249 (which is behind the last
> two upstream versions, so it may be a good idea to raise the issue with
> your distro.))
>
>> I checked kernel config and confirmed that CONFIG_STACKTRACE is enabled:
>>
>> zcat /proc/config.gz | grep CONFIG_STACKTRACE
>> CONFIG_STACKTRACE_SUPPORT=y
>> # CONFIG_STACKTRACE_BUILD_ID is not set
>> CONFIG_STACKTRACE=y
>>
>> Is there any other config that is missing?
> I don't think so (the file wouldn't be present otherwise).
>
> If there are no kernel stacks, the tasks execute in userspace and given
> the indefinite stuckage, they're likely looping somewhere (or you must
> have been unlucky to miss a syscall), which should manifest in their CPU
> consumption.
>
> The userspace stack may be of interest then, e.g.
> `gdb -ex "bt" --batch -p 1`
>
> (for PID 1 and debuginfo for involved binaries must be present to obtain
> useful info).
>
> Michal
More information about the systemd-devel
mailing list