[PATCH 08/10] vfio-iommufd: Support iommufd for emulated VFIO devices

Jason Gunthorpe jgg at nvidia.com
Tue Nov 1 12:49:21 UTC 2022


On Tue, Nov 01, 2022 at 08:37:39AM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg at nvidia.com>
> > Sent: Wednesday, October 26, 2022 2:51 AM
> > 
> > Emulated VFIO devices are calling vfio_register_emulated_iommu_dev() and
> > consist of all the mdev drivers.
> > 
> > Like the physical drivers, support for iommufd is provided by the driver
> > supplying the correct correct standard ops. Provide ops from the core that
> > duplicate what vfio_register_emulated_iommu_dev() does.
> > 
> > Emulated drivers are where it is more likely to see variation in the
> > iommfd support ops. For instance IDXD will probably need to setup both a
> > iommfd_device context linked to a PASID and an iommufd_access context to
> > support all their mdev operations.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg at nvidia.com>
> > ---
> >  drivers/gpu/drm/i915/gvt/kvmgt.c  |   3 +
> >  drivers/s390/cio/vfio_ccw_ops.c   |   3 +
> >  drivers/s390/crypto/vfio_ap_ops.c |   3 +
> >  drivers/vfio/container.c          | 108 ++++++-----------------------
> >  drivers/vfio/iommufd.c            |  57 ++++++++++++++++
> >  drivers/vfio/vfio.h               |  10 ++-
> >  drivers/vfio/vfio_main.c          | 110 +++++++++++++++++++++++++++++-
> >  include/linux/vfio.h              |  14 ++++
> >  8 files changed, 217 insertions(+), 91 deletions(-)
> 
> mtty, mdpy and mbochs?

They don't call rw or pin_pages, so they don't need to do
anything:


	/*
	 * If the driver doesn't provide this op then it means the device does
	 * not do DMA at all. So nothing to do.
	 */
	if (!vdev->ops->bind_iommufd)
		return 0;

> > +int vfio_container_pin_pages(struct vfio_container *container,
> > +			     struct iommu_group *iommu_group, dma_addr_t
> > iova,
> > +			     int npage, int prot, struct page **pages)
> >  {
> > -	struct vfio_container *container;
> > -	struct vfio_group *group = device->group;
> > -	struct vfio_iommu_driver *driver;
> > -	int ret;
> > -
> > -	if (!pages || !npage || !vfio_assert_device_open(device))
> > -		return -EINVAL;
> > +	/* group->container cannot change while a vfio device is open */
> > +	struct vfio_iommu_driver *driver = container->iommu_driver;
> > 
> >  	if (npage > VFIO_PIN_PAGES_MAX_ENTRIES)
> >  		return -E2BIG;
> > 
> >  	/* group->container cannot change while a vfio device is open */
> > -	container = group->container;
> >  	driver = container->iommu_driver;
> 
> duplicated comment and assignment.
> 
> Actually, I'm not sure whether the comment should be put within this
> container helper and other two. There is no group reference in these
> helpers then it sounds like the comment makes more sense to be in the
> caller side?

Yeah, that is better

> > +void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int
> > npage)
> > +{
> > +	if (WARN_ON(!vfio_assert_device_open(device)))
> > +		return;
> > +
> > +	if (device->group->container) {
> > +		vfio_container_unpin_pages(device->group->container, iova,
> > +					   npage);
> > +	} else if (device->iommufd_access) {
> 
> be consistent with other two helpers i.e. if-if instead of if-else

Done

> > +		if (WARN_ON(iova > ULONG_MAX))
> > +			return;
> 
> Is there a reason why this is a WARN_ON only in unpin but not in pin?

This is how it has always been. I suppose someone once thought it
would be OK for the driver to do racy stuff during pin - but clearly
that is not the case. Lets fix it while we are here.

> > +int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data,
> > +		size_t len, bool write)
> > +{
> > +	if (!data || len <= 0 || !vfio_assert_device_open(device))
> > +		return -EINVAL;
> > +
> > +	if (device->group->container)
> > +		return vfio_container_dma_rw(device->group->container,
> > iova,
> > +					     data, len, write);
> > +
> > +	if (device->iommufd_access) {
> > +		unsigned int flags = 0;
> > +
> > +		if (iova > ULONG_MAX)
> > +			return -EINVAL;
> > +
> > +		/* VFIO historically tries to auto-detect a kthread */
> > +		if (!current->mm)
> > +			flags |= IOMMUFD_ACCESS_RW_KTHREAD;
> 
> Can you elaborate why this cannot be put in iommufd as the default
> policy similar to what vfio container does?

Snooping in kernel structs to try to guess the calling execution
context is bad design. The caller should know its own context and it
should declare positively what it is. Someday this should be lifted
out of VFIO as well and into the drivers.

Jason



More information about the dri-devel mailing list