[systemd-devel] unable to attach pid to service delegated directory in unified mode after restart

Mon Feb 21 17:07:58 UTC 2022

>
> Hmm? Hard requirement of what? Not following?
>
>
The hard requirement that my project has is that processes need to live
even if the daemon who forked them dies.
Roughly it is how a batch scheduler works: one controller sends a request
to my daemon for launching a process in the name of a user, my daemon
forks-exec it. At some point my daemon can be stopped, restarted, upgraded,
whatever but the forked processes need to always be alive because they are
continuing their work. We are talking here about the HPC world.

> You are leaving processes around when your service dies/restarts?
>

Yes.

> That's a bad idea typically, and a generally a hack: the unit should
> probably be split up differently, i.e. the processes that shall stick
> around on restart should probably be in their own unit, i.e. another
> service or scope unit.
>

So, if I understand it correctly you are suggesting that every forked
process must be started through a new systemd unit?
If that's the case it seems inconvenient because we're talking about a job
scheduler where sometimes may have thousands of forked processes executed
quickly, and where performance is key.
Having to manage a unit per each process will probably not work in this
situation in terms of performance.

The other option I can imagine is to start a new unit from my daemon of
Type=forking, which remains forever until I decide to clean it up even if
it doesn't have any process inside.
Then I could put my processes in the associated cgroup instead of inside
the main daemon cgroup. Would that make sense?

The issue here is that for creating the new unit I'd need my daemon to
depend on systemd libraries, or to do some fork-exec using systemd commands
and parsing output.
I am trying to keep the dependencies at a minimum and I'd love to have an
alternative.

> That's not supported. You may only create your own cgroups where you
> turned on delegation, otherwise all bets are off. If you put stuff in
> /sys/fs/cgroup/user-stuff its as if you placed stuff in systemd's
> "-.slice" without telling it so, and things will break sooner or
> later, and often in non-obvious ways.
>

Yeah, I know and understand it is not supported, but I am more interested
in the technical part of how things would break.
I see in systemd/src/core/cgroup.c that it often differentiates a cgroup
with delegation with one without it (!unit_cgroup_delegate(u)), but it's
hard for me to find out how or where this exactly will mess up with any
cgroup created outside of systemd. I'd appreciate it if you can give me
some light on why/when/where things will break in practice, or just an
example?

I am also aware of the single-writer policy that systemd has in its
documentation, and I am aware that this is not supported, but I'd like to
understand exactly what can happen.

Thanks for your help & time :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20220221/7c0b50e0/attachment.htm>