[PATCH v2 00/11] Connect VFIO to IOMMUFD

Jason Gunthorpe jgg at nvidia.com
Mon Nov 14 14:23:35 UTC 2022


On Thu, Nov 10, 2022 at 10:01:13PM -0500, Matthew Rosato wrote:
> On 11/7/22 7:52 PM, Jason Gunthorpe wrote:
> > This series provides an alternative container layer for VFIO implemented
> > using iommufd. This is optional, if CONFIG_IOMMUFD is not set then it will
> > not be compiled in.
> > 
> > At this point iommufd can be injected by passing in a iommfd FD to
> > VFIO_GROUP_SET_CONTAINER which will use the VFIO compat layer in iommufd
> > to obtain the compat IOAS and then connect up all the VFIO drivers as
> > appropriate.
> > 
> > This is temporary stopping point, a following series will provide a way to
> > directly open a VFIO device FD and directly connect it to IOMMUFD using
> > native ioctls that can expose the IOMMUFD features like hwpt, future
> > vPASID and dynamic attachment.
> > 
> > This series, in compat mode, has passed all the qemu tests we have
> > available, including the test suites for the Intel GVT mdev. Aside from
> > the temporary limitation with P2P memory this is belived to be fully
> > compatible with VFIO.
> 
> AFAICT there is no equivalent means to specify
> vfio_iommu_type1.dma_entry_limit when using iommufd; looks like
> we'll just always get the default 65535.

No, there is no arbitary limit on iommufd

> Was this because you envision the limit being not applicable for
> iommufd (limits will be enforced via either means and eventually we
> won't want to ) or was it an oversight?

The limit here is primarily about limiting userspace abuse of the
interface.

iommufd is using GFP_KERNEL_ACCOUNT which shifts the responsiblity to
cgroups, which is similar to how KVM works.

So, for a VM sandbox you'd set a cgroup limit and if a hostile
userspace in the sanbox decides to try to OOM the system it will hit
that limit, regardless of which kernel APIs it tries to abuse.

This work is not entirely complete as we also need the iommu driver to
use GFP_KERNEL_ACCOUNT for allocations connected to the iommu_domain,
particularly for allocations of the IO page tables themselves - which
can be quite big.

Jason


More information about the dri-devel mailing list