[systemd-devel] network/openvswitch dependency loop/deadlock

Colin Guthrie gmane at colin.guthr.ie
Thu Feb 7 04:13:12 PST 2013


'Twas brillig, and Ian Pilcher at 06/02/13 22:27 did gyre and gimble:
> Recently, Fedora shipped an update which starts the Open vSwitch service
> on demand -- whenever an Open vSwitch bridge or port is "ifup'ed".  In
> theory, I should now be able to simply write traditional ifcfg-* files
> for all of my interfaces and use the "network" service to start them.
> 
> Unfortunately, my interfaces are not starting properly.  I believe that
> the sequence of events is as follows:
> 
> * systemd starts the network service (/etc/rc.d/init.d/network)
> 
> * The network tries to start an interface (ifup eth0)
> 
> * ifup reads "TYPE=ovs" from ifcfg-eth0 and executes ifup-ovs
> 
> * ifup-ovs sees that the Open vSwitch daemon is not running
>   (/var/lock/subsys/openvswitch does not exist) and executes "service
>   openvswitch start"
> 
> * /usr/sbin/service executes "systemctl start openvswitch.service"
> 
> * systemd sees "Before=... network.target" in openvswitch.service and
>   waits for the network service to complete -- which will never happen,
>   because the network service is waiting for the openvswitch servicr to
>   start.
> 
> * DEADLOCK!

This last step shouldn't (in theory) be a problem as far as I understand
it. Before=network.target doesn't imply it that has to wait for
network.service to complete - it should only imply that both
network.service and openvswitch.service are both have to start before
network.target is considered reached. If it said After=network.target
then I would see an obvious deadlock, but with both saying Before= they
should be able to work fine.

I'm guessing that the job simply isn't scheduled because a current
transaction is running and the scheduler does not fiddle with it while
it's running. It just queues another job for later after it's done
(although this also doesn't seem correct... as if that were the case why
would --ignore-dependencies work).

> At this point, everything grinds to a halt.  The "systemctl start
> openvswitch service" process hangs, and the network service sits there
> waiting for it to complete.
> 
> After 5 minutes, systemd kills the network service.  Since the network
> service is no longer running, systemd considers that the network.target
> has been reached and starts the Open vSwitch daemon.
> 
> Various bits of information about my system are posted here:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=818754#c21
> 
> All of this is background to my question -- is this dependency loop/
> deadlock the expected behavior in this case?
> 
> Assuming that the answer is yes, what is the best way to work around
> this?
> 
> * Removing network.target from the Before=... line in
>   openvswitch.service is not an option.  See comment #1 of that bug.
> 
> * Changing the network startup script (ifup-ovs) to use "systemctl
>   --ignore-dependencies start openvswitch.service" appears to work, but
>   the man page discourages its use for anything but debugging.

Depending on how the daemon is used, it might make more sense to use
--no-block. This will return control to the command line straight away,
but obviously the daemon may not be "ready" for communications yet and
the script may fail.

I'm not familiar with the daemon or what it does and how any IPC may
work (i.e. how you talk to the daemon).

To me, it would make more sense to make it socket or dbus activatable if
that's what it uses, but that by itself will be unlikely to
fundamentally solve the transactional problems here (especially if two
way comms are required during the ifup-ovs execution)

> It would be very nice to not have to create another unit file just to
> ignore this single dependency in this single circumstance.

I get the feeling I'm perhaps misinterpreting something. I think the
real reason for the deadlock would be good to track down. It could be
that it is being artificially held back from completing or some other
dep is causing the problem.

Perhaps a "systemctl list-jobs" when it's stuck would help? Also the
systemctl show from network.service, openvswitch.service and
network.target are likely useful too to track down which dep may be
causing problems.

It's just odd that removing "Before=network.target" solves the
problem... seems odd.


Also re the initscripts tweaks and the if statement proposed in the bug,
there is a SYSTEMCTL_IGNORE_DEPENDENCIES=1 env var you can export that
will make "service openvswitch start" pass the --ignore-dependencies
argument if it redirects to systemctl. That's likely cleaner than the if
[ -x /usr/bin/systemctl ] check. Obviously as this is arguably not the
right fix anyway, it's perhaps a moot point.

Col

-- 

Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/

Day Job:
  Tribalogic Limited http://www.tribalogic.net/
Open Source:
  Mageia Contributor http://www.mageia.org/
  PulseAudio Hacker http://www.pulseaudio.org/
  Trac Hacker http://trac.edgewall.org/


More information about the systemd-devel mailing list