[Intel-gfx] [PATCH v4 2/9] vfio-iommufd: Create iommufd_access for noiommu devices

Liu, Yi L yi.l.liu at intel.com
Wed May 3 09:57:54 UTC 2023


> From: Jason Gunthorpe <jgg at nvidia.com>
> Sent: Wednesday, May 3, 2023 2:22 AM
> 
> On Sat, Apr 29, 2023 at 12:13:39AM +0800, Yi Liu wrote:
> 
> > > Whoa, noiommu is inherently unsafe an only meant to expose the vfio
> > > device interface for userspace drivers that are going to do unsafe
> > > things regardless.  Enabling noiommu to work with mdev, pin pages, or
> > > anything else should not be on our agenda.  Userspaces relying on niommu
> > > get the minimum viable interface and must impose a minuscule
> > > incremental maintenance burden.  The only reason we're spending so much
> > > effort on it here is to make iommufd noiommu support equivalent to
> > > group/container noiommu support.  We should stop at that.  Thanks,
> >
> > btw. I asked a question in [1] to check if we should allow attach/detach
> > on noiommu devices. Jason has replied it. If in future noiommu userspace
> > can pin page, then such userspace will need to attach/detach ioas. So I
> > made cdev series[2] to allow attach ioas on noiommu devices. Supporting
> > it from cdev day-1 may avoid probing if attach/detach is supported or
> > not for specific devices when adding pin page for noiommu userspace.
> >
> > But now, I think such a support will not in plan, is it? If so, will it
> > be better to disallow attach/detach on noiommu devices in patch [2]?
> >
> > [1] https://lore.kernel.org/kvm/ZEa+khH0tUFStRMW@nvidia.com/
> > [2] https://lore.kernel.org/kvm/20230426150321.454465-21-yi.l.liu@intel.com/
>
> If we block it then userspace has to act quite differently, I think we
> should keep it.

Maybe kernel can simply fail the attach/detach if it happens on noiommu
devices, and noiommu userspace should just know it would fail. @Alex,
how about your opinion?

> My general idea to complete the no-iommu feature is to add a new IOCTL
> to VFIO that is 'pin iova and return dma addr' that no-iommu userspace
> would call instead of trying to abuse mlock and /proc/ to do it. That
> ioctl would use the IOAS attached to the access just like a mdev would
> do, so it has a real IOVA, but it is not a mdev.

This new ioctl may be IOMMUFD ioctl since its input is the IOAS and
addr, nothing related to the device. Is it?

> unmap callback just does nothing, as Alex says it is all still totally
> unsafe.

Sure. That's also why I added a noiommu test to avoid calling
unmap callback although it seems not possible to have unmap
callback as mdev drivers would implement it.

> 
> This just allows it use the mm a little more properly and safely (eg
> mlock() doesn't set things like page_maybe_dma_pinned(), proc doesn't
> reject things like DAX and it currently doesn't make an adjustment for
> the PCI offset stuff..) So it would make DPDK a little more robust,
> portable and make the whole VFIO no-iommu feature much easier to use.

Thanks for the explanation.

> To do that we need an iommufd access, an access ID and we need to link
> the current IOAS to the special access, like mdev, but in any mdev
> code paths.
> 
> That creating the access ID solves the reset problem as well is a nice
> side effect and is the only part of this you should focus on for now..

Yes. I get this part. We only need access ID so far to fix the noiommu
gap in hot-reset.

Regards,
Yi Liu
 


More information about the Intel-gfx mailing list