[systemd-devel] Ordering services issue. Trying to start ptp4l in bonding setup fails as bonding appears to take a while.

Brian Hutchinson b.hutchman at gmail.com
Wed Dec 1 20:27:43 UTC 2021


Hey James,

Thanks!  Responses below

On Wed, Dec 1, 2021 at 1:12 PM James Feeney <james at nurealm.net> wrote:

> On 12/1/21 07:20, Brian Hutchinson wrote:
> > ...
> > In .system file I tried all I know to ensure the required interfaces
> were created before starting ptp4l in attempt to give bonding enough time
> to finish but binding to things like sys-subsystem-net-devices-bond1.device
> wasn't enough.
> >
> > Is it also possible to use carrier state in .service file?
> >
> > I see sys/devices/virtual/net/bond1/carrier but not sure how to only
> attempt to start my ptp4l service after carrier state is "1".
> >
> > I welcome your ideas and suggestions on how to start a service after a
> bond interface is really up.
>
> With systemd, the proper way to setup network bonding is to establish
> ordering with the use of "target" files, which can be added to
> /etc/systemd/system.
>
> The target files themselves need not contain anything, though I have these
> with simply:
>
> [Unit]
> Documentation= man:systemd.target(5)
>
> My configuration provides automatic bonding and bridging for
> removable/pluggable and fixed hardwired, wireless, and virtual interfaces,
> using hardlinked template files and a separate network configuration file,
> as /etc/conf.d/network, though you are only looking for bonding here.  The
> big advantage with using systemd as the network configuration system,
> compared to alternatives, is that it "just works", and doesn't break after
> someone else's "upgrade".
>

Your hardware situation is certainly more interesting than mine with
hotplug stuff ... in the old days I had to do udev rules for stuff like
that but with this project I decided to finally go with systemd.  For the
most part it does "just work" until it "doesn't" and I've ran into that
quite a few times now and this is one of those cases (note I'm on
linux-fslc-imx 5.10.69 and I understand some bonding issues have been fixed
in 5.11 but I don't think that fix pertains to what I'm seeing here).
There are hooks to guarantee the "network is online" before going on ...
and they don't work right in this case.  You can see from my serial console
log bond1 isn't up until after the login prompt and all the systemd targets
have finished! And systemd was told not to start ptp4l until after the
network is up and you can clearly see it being started before bond1 is up.


> The essential idea with configuring virtual network interfaces using
> systemd target files derives from noting that network service clients and
> servers must run After bridge and bond master interfaces are working, which
> implies After configuration of their respective slave interfaces, and that
> hardware devices can only be enslaved After the master interfaces have been
> created.  These constraints imply the following ordering:
>
> 1) master interfaces
> 2) enslaved interfaces
> 3) network services
>
> The systemd target files are then inferred between these three stages:
>
> a) master interfaces
> b) "go.target"
> c) enslaved interfaces
> d) "ll.target"
> e) network services
>
> The target file naming is arbitrary, of course.  I use these names from
> arbitrarily choosing the point of view from the template file used to
> configure each slave device to each master, where finally "ip link set %P
> master %I".
>
> You could use the terminology "director" and "executive", from corporate
> structure lingo, instead of "master" and "slave", if preferred, but the ip
> command still uses the the terms "master" and "slave".
>
> A hardware network device Requires go.target and the master interface
> service file "master at .service" runs Before go.target:
>
> Requires= go.target
> Before= go.target
>
> Plugging network hardware, then, will trigger the entire chain of
> configuration events.
>
> BindsTo= sys-subsystem-net-devices-%i.device
>
> Similarly, for the enslaved interface service file "enslaved at .service":
>
> Requires= go.target
> After= go.target
> Before= ll.target
>
> And finally, for the various network services service template files:
>
> PartOf= ll.target
> Requires= ll.target
> After= ll.target
>
> That's the basic idea.  Of course, there are plenty of "housekeeping"
> details in practice.  In particular, "Requisite" fails to recognize device
> units, and instead,
>
> ConditionPathExists= /sys/class/net/%I
>
> is necessary.  This appears to me to be an unjustified bug with
> "Requisite", but - you know - Lennart.
>
> Altogether, to trigger configuration of both master and slave devices from
> "enslaved at .service":
>
> BindsTo= sys-subsystem-net-devices-%p.device
> ConditionPathExists= /sys/class/net/%P
> BindsTo= sys-subsystem-net-devices-%i.device
>
> It is useful to impose an arbitrary but strict naming convention with
> these files, to allow use of systemd specifiers and template files.  In
> your case, you might simply hard-code what you want, if you are not looking
> for a generic solution, and all you want is bonding on a couple of
> interfaces.
>
> Still, when properly setup, you can individually "start" and "stop" any of
> the target units or network service units and get correct behavior.
>

Maybe I'm missing something here but I don't see any way for me to "add
targets" to this problem to solve it unless I abandon the systemd way of
setting up the bond and wrap my "command line" way of creating the bond
(with echo and ip commands) with .target and .service files ... which is
going back to init scripts basically.

I guess I could make a .service that calls an ExecCondition= script that
could see if /sys/devices/virtual/net/bond1/carrier = 1
AND (/sys/bus/i2c/devices/0-005f/net/lan1/carrier = 1 OR
/sys/bus/i2c/devices/0-005f/net/lan2/carrier = 1)

... and start my ptp4l service after that.

But even that would probably need to Restart=on-failure like I have now if
those interfaces aren't up yet.

I guess I'm just having a bit of buyer's remorse for believing I could rely
on network-online.target before going on ... and I can't.
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ "If you
are a developer, instead of wondering what to do about network.target,
please just fix your program to be friendly to dynamically changing network
configuration. That way you will make your users happy because things just
start to work, and you will get fewer bug reports as your stuff is just
rock solid. You also make the boot faster for your users, as they don't
have to delay arbitrary services for the network anymore" ... I guess the
systemd bonding implementor didn't abide by all that. ;)

I've read more and I've usurped the default action of
systemd-networkd-wait-online.service
to be more specific on which interfaces to wait on and what states they
need to be in before moving on:

#  SPDX-License-Identifier: LGPL-2.1+
#
#  This file is part of systemd.
#
#  systemd is free software; you can redistribute it and/or modify it
#  under the terms of the GNU Lesser General Public License as published by
#  the Free Software Foundation; either version 2.1 of the License, or
#  (at your option) any later version.

[Unit]
Description=Wait for Network to be Configured
Documentation=man:systemd-networkd-wait-online.service(8)
DefaultDependencies=no
Conflicts=shutdown.target
Requires=systemd-networkd.service
After=systemd-networkd.service
Before=network-online.target shutdown.target

[Service]
Type=oneshot
ExecStart=/lib/systemd/systemd-networkd-wait-online --interface
bond1:degraded-carrier:carrier --interface lan1:carrier
RemainAfterExit=yes

[Install]
WantedBy=network-online.target

... and it still doesn't work ... you can clearly see "Sync Microchip PHC
with PTP Grand Master Clock" (my ptp4l.service) being called before the
bond1 is online ... which doesn't happen until after the login prompt:

[  OK  ] Reached target Network.
[    4.096782] imx-sdma 302c0000.dma-controller: firmware found.
[  OK  ] Reached targe[    4.104764] imx-sdma 302c0000.dma-controller:
loaded firmware 4.5
t Network is Online.[    4.109828] caam-snvs 30370000.caam-snvs: violation
handlers armed - init state

[  OK  ] Reached target Host and Network Name Lookups.
        Starting Avahi mDNS/DNS-SD Stack...
        Starting Enable ksz9567...
        Starting The NGINX HTTP and reverse proxy server...
        Starting Sync M[    4.189072] imx-sdma 302b0000.dma-controller:
firmware found.
icrochip PH…with PTP Grand Master Clock...
[  OK  ] Started Enable ksz9567.
[FAILED] Failed to start Sync Micro…C with PTP Grand Master Clock.
See 'systemctl status ptp4l.service' for details.
[  OK  ] Started The NGINX HTTP and reverse proxy server.
[    4.254479] imx-sdma 30bd0000.dma-controller: firmware found.
[  OK  ] Started Avahi mDNS/DNS-SD Stack.
[    4.413378] ksz9477-switch 0-005f lan1: configuring for phy/gmii link
mode
[    4.427011] bond1: (slave lan1): Enslaving as a backup interface with a
down link
[    4.501283] ksz9477-switch 0-005f lan2: configuring for phy/gmii link
mode
[    4.511903] bond1: (slave lan2): Enslaving as a backup interface with a
down link
        Starting Save/Restore Sound Card State...
[  OK  ] Started Save/Restore Sound Card State.
[  OK  ] Reached target Sound Card.
[    5.009993] random: crng init done
[    5.013414] random: 7 urandom warning(s) missed due to ratelimiting
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started System Logger Daemon "default" instance.
[  OK  ] Reached target Multi-User System.
        Starting Update UTMP about System Runlevel Changes...
[  OK  ] Started Update UTMP about System Runlevel Changes.

Poky (Yocto Project Reference Distro) 3.1.7 imx8mmevk ttymxc1

imx8mmevk login: [    7.531146] ksz9477-switch 0-005f lan1: Link is Up -
1Gbps/Full - flow control rx/tx
[    8.873069] bond1: (slave lan1): link status definitely up, 1000 Mbps
full duplex
[    8.882016] bond1: (slave lan1): making interface the new active one
[    8.892488] device eth0 entered promiscuous mode
[    8.897180] audit: type=1700 audit(1600598644.664:2): dev=eth0 prom=256
old_prom=0 auid=4294967295 uid=0 gid=0 ses=4294967295
[    8.913688] bond1: active interface up!
[    8.917595] IPv6: ADDRCONF(NETDEV_CHANGE): bond1: link becomes ready

systemctl status ptp4l

[[0;1;31m*[[0m ptp4l.service - Sync Microchip PHC with PTP Grand Master
Clock
    Loaded: loaded (/etc/systemd/system/ptp4l.service; enabled; vendor
preset: disabled)
    Active: [[0;1;31mfailed[[0m (Result: exit-code) since Sun 2020-09-20
10:44:01 UTC; 40s ago
   Process: 332 ExecStart=/usr/bin/ptp4l -f
/etc/linuxptp/ptp4l.conf_e2e_one_step_g8275.2 -s -i bond1
[[0;1;31m(code=exited,
status=255/EXCEPTION)[[0m
  Main PID: 332 (code=exited, status=255/EXCEPTION)

Sep 20 10:44:01 imx8mmevk systemd[1]: Starting Sync Microchip PHC with PTP
Grand Master Clock...
Sep 20 10:44:01 imx8mmevk ptp4l[332]: [[0;1;31m[[0;1;39m[[0;1;31m[5.601]
interface 'bond1' does not support requested timestamping mode[[0m
Sep 20 10:44:01 imx8mmevk ptp4l[332]: failed to create a clock
Sep 20 10:44:01 imx8mmevk systemd[1]: [[0;1;39m[[0;1;31m[[0;1;39mptp4l.service:
Main process exited, code=exited, status=255/EXCEPTION[[0m
Sep 20 10:44:01 imx8mmevk systemd[1]:
[[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185mptp4l.service:
Failed with result 'exit-code'.[[0m
Sep 20 10:44:01 imx8mmevk systemd[1]: [[0;1;31m[[0;1;39m[[0;1;31mFailed to
start Sync Microchip PHC with PTP Grand Master Clock.[[0m

cat ptp4l.service
[Unit]
Description=Sync Microchip PHC with PTP Grand Master Clock
#Requires=network-online.target multi-user.target
#BindsTo=sys-subsystem-net-devices-bond1.device
sys-subsystem-net-devices-lan1.device sys-subsystem-net-devices-lan2.device
multi-user.target
#After=sys-subsystem-net-devices-bond1.device
sys-subsystem-net-devices-lan1.device sys-subsystem-net-devices-lan2.device
multi-user.target
After=network-online.target
Wants=network-online.target

[Service]
Type=exec
#NotifyAccess=all
ExecStart=/usr/bin/ptp4l -f /etc/linuxptp/ptp4l.conf_e2e_one_step_g8275.2
-s -i bond1
#Restart=on-failure
#RestartSec=1

[Install]
WantedBy=multi-user.target

...but after logging in and running systemctl restart ptp4l everything
works.  This is a straight up race condition during startup ... and I don't
know how to fix it the "systemd" way.  Am I doing something wrong or is
something in systemd bonding broken???

Regards,

Brian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/systemd-devel/attachments/20211201/66d902e1/attachment-0001.htm>


More information about the systemd-devel mailing list