[systemd-devel] systemd tries to terminate a process that seems to have exited

Tue May 10 16:32:11 UTC 2022

> The hint about non-empty cgroup + gap in PID sequence [1] suggest that
> the parent and child are not the only two processes of the service.

The gap in PIDs can be explained by a lot of processes starting at
that moment. In that particular case:

```
May 09 17:52:47 cb6d1c84f84e systemd[106]: gnome-keyring.service:
About to execute /usr/local/bin/gnome-keyring-daemon --start
--components pkcs11,secrets
May 09 17:52:47 cb6d1c84f84e systemd[106]: gnome-keyring.service:
Forked /usr/local/bin/gnome-keyring-daemon as 310
...
May 09 17:52:47 cb6d1c84f84e systemd[106]:
gnome-remote-desktop.service: About to execute
/usr/libexec/gnome-remote-desktop-daemon
May 09 17:52:47 cb6d1c84f84e systemd[106]:
gnome-remote-desktop.service: Forked
/usr/libexec/gnome-remote-desktop-daemon as 311
...
May 09 17:52:47 cb6d1c84f84e systemd[106]:
gnome-session-monitor.service: About to execute
/usr/libexec/gnome-session-ctl --monitor
May 09 17:52:47 cb6d1c84f84e systemd[106]:
gnome-session-monitor.service: Forked /usr/libexec/gnome-session-ctl
as 312
...
May 09 17:52:47 cb6d1c84f84e systemd[106]: session-migration.service:
About to execute /usr/bin/session-migration
May 09 17:52:47 cb6d1c84f84e systemd[106]: session-migration.service:
Forked /usr/bin/session-migration as 313
```

About non-empty cgroup I'm not sure. The status is:

```
● gnome-keyring.service - Start gnome-keyring for the Secrets Service,
and PKCS #11
     Loaded: loaded (/usr/lib/systemd/user/gnome-keyring.service;
enabled; vendor preset: enabled)
     Active: deactivating (stop-sigterm)
    Process: 310 ExecStart=/usr/local/bin/gnome-keyring-daemon --start
--components pkcs11,secrets (code=exited, status=0/SUCCESS)
   Main PID: 310 (code=exited, status=0/SUCCESS)
     CGroup: /docker/df654b46027c96861325528cba8f18aa65fb8c77986ffe7ce575a582334aff17/user.slice/user-1000.slice/user at 1000.service/app.slice/gnome-keyring.service
```

Then changes to (when it times out):

```
× gnome-keyring.service - Start gnome-keyring for the Secrets Service,
and PKCS #11
     Loaded: loaded (/usr/lib/systemd/user/gnome-keyring.service;
enabled; vendor preset: enabled)
     Active: failed (Result: timeout) since Tue 2022-05-10 15:19:33
UTC; 315ms ago
    Process: 310 ExecStart=/usr/local/bin/gnome-keyring-daemon --start
--components pkcs11,secrets (code=exited, status=0/SUCCESS)
   Main PID: 310 (code=exited, status=0/SUCCESS)
```

> [1] Can be parent's threads or concurrently spawned processes elsewhere in
> the system.

The processes that are related in one way or another... the
gnome-keyring service which spawns one child (both exit):

    gnome-keyring-daemon --start --components pkcs11,secrets

The org.freedesktop.secrets service (activated via dbus):

    gnome-keyring-daemon --start --foreground --components=secrets

And the gnome-keyring-ssh service:

    gnome-keyring-daemon --start --components ssh

Not a gnome-keyring expert, far from it.

> That's very old. As far as most of the Debian project is concerned,
> Debian 8 reached EOL in mid 2018.

Yep, I'd like to know what's happening mainly out of curiosity (and
maybe to learn something new). Because in my view it behaves really
weird (like, "a process finishes, but it doesn't").

> To my knowledge Docker is not capable of running a proper
> systemd-based userspace as a container. I.e. it does not implement
> this:

> https://systemd.io/CONTAINER_INTERFACE

Well, I've managed to run GNOME in a docker container and connect to
it over VNC:

https://gist.github.com/x-yuri/dc6a9ce59ca823102903033da0143304

Although there's at least one major issue I haven't investigated yet.

And there's also:

https://hub.docker.com/r/jrei/systemd-ubuntu

Which more or less worked for me under docker for another project.

> As I understand, they are not interested in this, think this is out of
> focus. Which is certainly their right. But if you want to run systemd
> as container payload, then bettr use a different container manager,
> like podman, lxc, systemd-spawn. They all are a lot more open to
> supporting systemd as payload in a way that just works.

Thanks for the suggestion, I'm considering them too. But I'd like to
first find out what's happening here.

> Docker is particularly borked when it comes to the way cgroups are set
> up. And given that they are stuck on cgroupsv1 (or did that change?) i
> see no perspective there.

At least Docker 20.10.10 doesn't support it.

> My educated guess: you are running in cgroupsv1 mode. cgroup empty
> notifications do not work reliably in containers on cgroupsv1.

Yep, in this case I'm using cgroupv1. Can this all be explained by
"cgroup empty notifications do not work reliably in containers on
cgroupsv1"? Adding `sleep 5` seems to resolve the issue, but I'm not
sure if that's reliable.

Let's put it this way, can the described behavior be explained like
this? With cgroupv1 "empty cgroup" notifications in containers don't
always reach systemd. As a result, if systemd doesn't receive an
"empty cgroup" notification, it thinks some processes are still
running (although there're none left), tries to kill them and
eventually times out. Does that sound correct?

On Tue, May 10, 2022 at 4:22 PM Lennart Poettering
<lennart at poettering.net> wrote:
>
> On Di, 10.05.22 08:44, Yuri Kanivetsky (yuri.kanivetsky at gmail.com) wrote:
>
> > The one that produces the messages is 249.11 (that is running in a
> > docker container):
> >
> > https://packages.ubuntu.com/jammy/systemd
> >
> > The one running on the host is 215-17 (Debian 8).
>
> that's ancient... i figure this then also means you are stuck with
> cgroupv1. Which means cgroup empty notifications in containers
> typically don#t work.
>
> Lennart
>
> --
> Lennart Poettering, Berlin