[systemd-devel] [dm-devel] multipath breaks with recent udev/systemd

Benjamin Marzinski bmarzins at redhat.com
Tue Dec 16 14:18:56 PST 2014


On Tue, Dec 16, 2014 at 04:10:44PM -0600, Benjamin Marzinski wrote:
> On Mon, Dec 15, 2014 at 10:31:44AM +0100, Hannes Reinecke wrote:
> > Hi all,
> > 
> > in commit 3ebdb81ef088afd3b4c72b516beb5610f8c93a0d
> > (udev: serialize/synchronize block device event handling with file
> > locks) udev started using flock() on the device node, supposedly to
> > synchronize with an ominous 'block event handling'.
> > 
> > The code looks like this:
> > 
> >                   if (d) {
> >                         fd_lock = open(udev_device_get_devnode(d),
> > O_RDONLY|O_CLOEXEC|O_NOFOLLOW|O_NONBLOCK);
> >                         if (fd_lock >= 0 && flock(fd_lock,
> > LOCK_SH|LOCK_NB) < 0) {
> >                              log_debug("Unable to flock(%s),
> > skipping event handling: %m", udev_device_get_devnode(d));
> >                              err = -EWOULDBLOCK;
> >                              goto skip;
> >                        }
> >                    }
> > 
> > However, multipath since several years is using a similar construct
> > to lock all devices belonging to a multipath device table before
> > creating a mulitpath dm-device:
> > 
> > 	vector_foreach_slot (mpp->pg, pgp, i) {
> > 		if (!pgp->paths)
> > 			continue;
> > 		vector_foreach_slot(pgp->paths, pp, j) {
> > 			if (lock && flock(pp->fd, LOCK_SH | LOCK_NB) &&
> > 			    errno == EWOULDBLOCK)
> > 				goto fail;
> > 			else if (!lock)
> > 				flock(pp->fd, LOCK_UN);
> > 		}
> > 	}
> > 
> > So during bootup it's anyone's guess who's first, multipath or udev.
> > And depending on the timing either multipath will fail to setup
> > the device-mapper device or udev will simply ignore the device.
> > Neither of those is a good, but the first is an absolute killer for
> > modern systems which _rely_ on udev to configure devices.
> > 
> > So how it this supposed to work?
> > Why does udev ignore the entire event if it can't get the lock?
> > Shouldn't it rather be retried?
> > What is the supposed recovery here?
> 
> Hannes, are you against the idea that Alexander mentioned in his first
> email, of just locking a file in /var/lock?  Multipathd doesn't create
> devices in parallel. Multipath doesn't create files in parallel.  We are
> explicitly trying to avoid multipath and multipathd creating files at
> the same time. So, we should only need a single file to lock, and
> /run/lock should always be there.

O.k. So if we want to keep our current nonblocking behavior, we'll need
more lockfiles, either one per path or one per wwid.  This still seems
like a reasonable idea, if there is a good reason for systemd doing what
it's doing.

-Ben

> 
> -Ben
> 
> > 
> > Cheers,
> > 
> > Hannes
> > -- 
> > Dr. Hannes Reinecke		               zSeries & Storage
> > hare at suse.de			               +49 911 74053 688
> > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> > GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> > HRB 21284 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel


More information about the systemd-devel mailing list