jmoellers at suse.de
Thu Apr 18 13:46:26 UTC 2019
On 18.04.19 15:35, Lennart Poettering wrote:
> On Do, 18.04.19 14:21, Josef Moellers (jmoellers at suse.de) wrote:
>> We're currently working on a bug which afaict is due to a race condition:
>> 1) systemd starts xenstored.service
>> 2) /etc/xen/scripts/launch-xenstore does its work (starts
>> 3) /etc/xen/scripts/launch-xenstore runs "systemd-notify --ready"
>> 4) "systemd-notify --ready" sends a UDP-message to systemd
>> 5) /etc/xen/scripts/launch-xenstore exits
>> 6) systemd gets SIGCHLD and removes the PID from watch_pids
>> 7) systemd receives the UDP message, but
>> a) the process is gone
>> b) the PID is not in watch_pids any more.
>> 8) "Cannot find unit for notify message of PID..."
>> 9) No start of depending units.
>> I see no proper way to get out of this but to make the systemd-notify
>> synchronous rather than fire-and-forget and expect it to wait for a
>> response from systemd.
> Hmm, the sd_notify() message handling actually runs at a higher
> priority than the SIGCHLD handling, so that when both happen at the
> same time we should always process the sd_notify message first, and
> the SIGCHLD message second, precisey to avoid this problem.
> Which systemd version is this? Is this reproducible in current
> upstream versions?
It's SUSE systemd-v234+suse.381.g98de7a236
We're seeing this on one of our openQA SLES15 Virtualization-Milestone
testing systems, haven't checked with the most recent upstream version
yet. I'm not sure if we can easily switch to a different systemd
version. Obviously, I accept your reservation.
But what you wrote earlier about the ordering: although I am pretty
convinced that the scenario is what happens, I'll re-check whether maybe
the current upstream version has modifications that haven't found their
way into our version.
SUSE Linux GmbH
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
More information about the systemd-devel