[systemd-devel] [libvirt] How to make udev not touch my device?
Daniel P. Berrange
berrange at redhat.com
Mon Nov 7 12:20:26 UTC 2016
On Mon, Nov 07, 2016 at 01:11:14PM +0100, Michal Privoznik wrote:
> On 07.11.2016 10:17, Daniel P. Berrange wrote:
> > On Fri, Nov 04, 2016 at 08:47:34AM +0100, Michal Privoznik wrote:
> >> Hey udev developers,
> >>
> >> I'm a libvirt developer and I've been facing an interesting issue
> >> recently. Libvirt is a library for managing virtual machines and as such
> >> allows basically any device to be exposed to a virtual machine. For
> >> instance, a virtual machine can use /dev/sdX as its own disk. Because of
> >> security reasons we allow users to configure their VMs to run under
> >> different UID/GID and also SELinux context. That means that whenever a
> >> VM is being started up, libvirtd (our daemon we have) relabels all the
> >> necessary paths that QEMU process (representing VM) can touch.
> >> However, I'm facing an issue that I don't know how to fix. In some cases
> >> QEMU can close & reopen a block device. However, closing a block device
> >> triggers an event and hence if there is a rule that sets a security
> >> label on a device the QEMU process is unable to reopen the device again.
> >>
> >> My question is, whet we can do to prevent udev from mangling with our
> >> security labels that we've set on the devices?
> >>
> >> One of the ideas our lead developer had was for libvirt to set some kind
> >> of udev label on devices managed by libvirt (when setting up security
> >> labels) and then whenever udev sees such labelled device it won't touch
> >> it at all (this could be achieved by a rule perhaps?). Later, when
> >> domain is shutting down libvirt removes that label. But I don't think
> >> setting an arbitrary label on devices is supported, is it?
> >
> > Having thought about this over the weekend, I'm strongly inclined to
> > just take udev out of the equation by starting a new mount namespace
> > for each QEMU we launch and setting up a custom /dev containing just
> > the devices we need. This will be both a security improvement and
> > avoid the udev races, with no complex code required in libvirt and
> > will work for libvirt all the way back to RHEL6
>
> How would this work with device hotplug, i.e. I start a domain with some
> set of devices. Then I bring up an iSCSI target (which appears under
> /dev) and how does one 'transfer' the device into the new namespace?
> BTW: can you elaborate more one udev-namespace relations? Doesn't udev
> run in the namespaces too?
A single process can only ever be in a single namespace at any point in
time and udev only ever runs in the initial namespaces. When running
containers you never have udev inside them, and udev certainly doesn't
interact with arbitrary namespaces created by other applications for
their own purposes.
So if libvirt creates a private mount namespace for each QEMU and mounts
a custom /dev there, this is invisible to udev, and thus udev won't/can't
mess with permissions we set in our private /dev.
For hotplug, the libvirt QEMU would do the same as the libvirt LXC driver
currently does. It would fork and setns() into the QEMU mount namespace
and run mknod()+chmod() there, before doing the rest of its normal hotplug
logic. See lxcDomainAttachDeviceMknodHelper() for what LXC does.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|
More information about the systemd-devel
mailing list