[systemd-devel] [dm-devel] multipath breaks with recent udev/systemd

Benjamin Marzinski bmarzins at redhat.com
Thu Dec 18 14:04:05 PST 2014

On Wed, Dec 17, 2014 at 01:04:54PM +0100, Hannes Reinecke wrote:
> On 12/16/2014 11:18 PM, Benjamin Marzinski wrote:
> > On Tue, Dec 16, 2014 at 04:10:44PM -0600, Benjamin Marzinski wrote:
> >> On Mon, Dec 15, 2014 at 10:31:44AM +0100, Hannes Reinecke wrote:
> [ .. ]
> >>> So during bootup it's anyone's guess who's first, multipath or udev.
> >>> And depending on the timing either multipath will fail to setup
> >>> the device-mapper device or udev will simply ignore the device.
> >>> Neither of those is a good, but the first is an absolute killer for
> >>> modern systems which _rely_ on udev to configure devices.
> >>>
> >>> So how it this supposed to work?
> >>> Why does udev ignore the entire event if it can't get the lock?
> >>> Shouldn't it rather be retried?
> >>> What is the supposed recovery here?
> >>
> >> Hannes, are you against the idea that Alexander mentioned in his first
> >> email, of just locking a file in /var/lock?  Multipathd doesn't create
> >> devices in parallel. Multipath doesn't create files in parallel.  We are
> >> explicitly trying to avoid multipath and multipathd creating files at
> >> the same time. So, we should only need a single file to lock, and
> >> /run/lock should always be there.
> > 
> > O.k. So if we want to keep our current nonblocking behavior, we'll need
> > more lockfiles, either one per path or one per wwid.  This still seems
> > like a reasonable idea, if there is a good reason for systemd doing what
> > it's doing.
> > 
> The problem is as follows:
> When multipathd is running we simply _cannot_ guarantee that no udev
> events are currently running. This currently hits us especially bad
> during system startup when device probing is still running during
> multipathd startup.
> Multipathd will then enumerate all block devices to setup the
> initial topology.
> But in doing so it might trip over device which are still processed
> by udev (or, worse still, _not yet_ processed by udev).
> (Yes, I know, libudev_enumerate should protect against this.
>  But it doesn't. )

But we start waiting for events before the initial multipath device
configuration, and don't process them until after that configuration
is compelete, so if there is ever a case where the initial configuration
is accessing the device to early, aren't we guaranteed to get an event
afterwards, assuming that udev doesn't drop it?

> So it's anyone guess what'll happen now; either multipath trips over
> the lock from udev when calling 'lock_multipath' (and consequently
> failing to setup the multipath device), or udev
> tripping over the lock from multipath and ignoring the event,
> leaving us with a non-functioning device.

But my point above is that if we use a lockfile instead of locking the
path device itself, there won't be any lock contention, and udev won't
drop the events.

> We can fixup the startup sequence (which we need to do anyway, given
> the libudev enumerate bug) to just re-trigger all block device
> events, but this still doesn't fix the actual issue.
> Point is, there might be _several_ events for the same device being
> queued (think of a flaky path with several PATH_FAILED /
> PATH_REINSTATED events in a row), and so multipathd might be
> processing one event for the device while udev is processing the
> next event for the same device.
> For this to work we need some synchronization with udev; _if_ there
> would be a libudev callout for 'is there an event for this device
> running' we can easily fail the 'lock_multipath' operation, knowing
> that we'll be getting another event shortly for the same device.

But if we can avoid the lock contention, then eventually all these
events will make it to multipathd, and we will be up to date. right?
Or am I missing something here?


> Alternatively we can call flock(LOCK_EX) on that device, but that
> will only work if udev would _not_ abort event handling for that
> device, but rather issues a retry.
> After all, there _is_ a device timeout in udev. It should be
> relatively easy to retry the event and let it run into a timeout if
> the lock won't be released.
> Cheers,
> Hannes
> -- 
> Dr. Hannes Reinecke		               zSeries & Storage
> hare at suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)

More information about the systemd-devel mailing list