[systemd-devel] BindsTo and parameterized instance units

Simon Mullis simon at mullis.co.uk
Thu Apr 13 13:52:21 UTC 2023


Hi All

I have a fairly complex (at least to me) setup of a master target spawning
multiple services and groups of instance services that are chained in a
specific order. I use systemd to manage all of the sockets that allow data
to flow between these different stages.
I use a master Target (foo.target) defined to manage the services state, so
I can easily stop and restart everything.
The first service (bar.service) is oneshot script that starts multiple
groups of instance services (the number of spawned services depends on CPU
cores and queue sizes among other things). I have ExecStart and ExecStop
scripts in the unit file.
For example: bar.service - This is the oneshot that spawns "n" baz at .service
and "n" qux at .service.  There are a lot of dependencies and so far systemd
has done everything I need.

What do I want?
If there is any failure or issue with any of the child processes spawned
from any of the instance units then I would like the whole fragile house of
cards to be torn down and restarted. i.e. the whole foo.target system state
to be restarted, not just the individual instance service (and subsequent
process) itself.

What are my observations?
This all works well except for the instance units. When I include the
instance units into the "BindsTo" with the target, I get additional
processes and services launched that I do not expect.

I have simplified the whole thing to two services, a target and a very
simple script. This demonstrates exactly the same thing that I see in the
much more complex version.

The "master service". This is a oneshot that spawns the instance units.
foo.service
[Unit]
Description=Foo service
BindsTo=foo.target
[Service]
Type=oneshot
ExecStart=some-path-somewhere/foo-start.sh
[Install]
WantedBy=foo.target

The instance unit that does the actual work. In this case we have a
placeholder to show the problem. In my real example I have a long chain of
services like this that uses systemd managed sockets to pass data along.
bar at .service
[Unit]
Description=Test service Bar instance %i
BindsTo=foo.target
[Service]
Type=simple
ExecStart=sh -c 'while true; do echo Bar %i is alive; sleep 3; done'
[Install]
WantedBy=foo.target

The target that allows me to stop and restart everything easily:
[Unit]
Description=Test Services
Requires=foo.service
[Install]
WantedBy=multi-user.target


And finally the script called in foo.service:
foo-start.sh
#!/usr/bin/bash
num=4
eval systemctl start bar@{1..${num}}.service

In order to tightly couple the processes and services I use BindsTo. But I
am getting inconsistent behavior when trying to apply this to the instance
units from the target.

Scenario A:
WITHOUT BindsTo for the instance units in the target:
- Everything stops and starts with the target.
- I get the correct number of processes.
- If I kill one of the PIDs below, systemd only restarts that process -
which of course is what most use-cases would require.
# ps -ef | grep [Bb]ar
root       17878       1  0 15:25 ?        00:00:00 sh -c while true; do
echo Bar 1 is alive; sleep 3; done
root       17880       1  0 15:25 ?        00:00:00 sh -c while true; do
echo Bar 2 is alive; sleep 3; done
root       17882       1  0 15:25 ?        00:00:00 sh -c while true; do
echo Bar 3 is alive; sleep 3; done
root       17887       1  0 15:25 ?        00:00:00 sh -c while true; do
echo Bar 4 is alive; sleep 3; done

 # systemctl list-units bar@\*.service
  UNIT            LOAD   ACTIVE SUB     DESCRIPTION
  bar at 1.service   loaded active running Test service Bar instance 1
  bar at 2.service   loaded active running Test service Bar instance 2
  bar at 3.service   loaded active running Test service Bar instance 3
  bar at 4.service   loaded active running Test service Bar instance 4

Scenario B:
WITH BindsTo in the unit instance file (BindsTo=bar@%i.service or
BindsTo=bar@%N.service):
- Everything stops and start with the target.
- i get EXTRA PROCESSES.
- If I kill one of the PIDs below, everything restarts properly (i.e. the
whole target) and I get the behavior I am looking for.
root       29250       1  0 16:08 ?        00:00:00 sh -c while true; do
echo Bar foo is alive; sleep 3; done  #<<<< What's this guy doing here?
root       29256       1  0 16:08 ?        00:00:00 sh -c while true; do
echo Bar 1 is alive; sleep 3; done
root       29258       1  0 16:08 ?        00:00:00 sh -c while true; do
echo Bar 2 is alive; sleep 3; done
root       29260       1  0 16:08 ?        00:00:00 sh -c while true; do
echo Bar 3 is alive; sleep 3; done
root       29262       1  0 16:08 ?        00:00:00 sh -c while true; do
echo Bar 4 is alive; sleep 3; done

So, however systemd is expanding the variables %i or %N, it's including an
additional service.

 # systemctl list-units bar@\*.service
  UNIT            LOAD   ACTIVE SUB     DESCRIPTION
  bar at 1.service   loaded active running Test service Bar instance 1
  bar at 2.service   loaded active running Test service Bar instance 2
  bar at 3.service   loaded active running Test service Bar instance 3
  bar at 4.service   loaded active running Test service Bar instance 4
  bar at foo.service loaded active running Test service Bar instance foo
#<<<< Here he is again! ????

Does anyone have any suggestions?  Is there a more elegant way to connect
the processes to the whole target for restart purposes?
Maybe this is a bug in my version of systemd but more likely I'm doing
something wrong.

Version info:
# systemctl --version
systemd 247 (247.3-7+deb11u1)
+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP
+GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2
-IDN +PCRE2 default-hierarchy=unified

Thank you for reading this far and thank you also in advance for any
suggestions.

Cheers!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20230413/595262ca/attachment.htm>


More information about the systemd-devel mailing list