[PATCH v2 10/11] vfio: Make vfio_container optionally compiled

Jason Gunthorpe jgg at nvidia.com
Wed Nov 9 19:52:55 UTC 2022


On Wed, Nov 09, 2022 at 10:18:09AM -0700, Alex Williamson wrote:

> DPDK supports no-iommu mode.

Er? Huh? How? I thought no-iommu was for applications that didn't do
DMA? How is DPDK getting packets in/out without DMA? I guess it snoops
in /proc/ or something to learn PFNs of mlock'd memory? <shudder>

> I agree that it's very useful for testing, I'm certainly not suggesting
> to give up, but I'm not sure where no-iommu lives when iommufd owns
> /dev/vfio/vfio.  Given the unsafe interrupts discussion, it doesn't
> seem like the type of thing that would be a priority for iommufd.

Ah, I think the bit below will do the job, I'm not sure without doing
some testing (and I don't think I have the necessary Intel NIC for
DPDK). The basic point is no-iommu literally means 'do not use iommufd
at all, do not create an iommufd_device or an iommufd_access'. VFIO
can easially do that on its own.

The only slightly messy bit is that uAPI requires the SET_CONTAINER
which we can just NOP in vfio_compat. With more checks it could have
higher fidelity of error cases, but not sure it is worth it.

When we talk about the device cdev flow then I suggest that no-iommu
simply requires -1 for the iommufd during bind - ie no iommufd is
used or accepted and that is how the userspace knows/confirms it is in
no-iommu mode.

> We're on a path where vfio accepts an iommufd as a container, and
> ultimately iommufd becomes the container provider, supplanting the
> IOMMU driver registration aspect of vfio.  I absolutely want type1 and
> spapr backends to get replaced by iommufd, but reluctance to support
> aspects of vfio "legacy" behavior doesn't give me warm fuzzies about a
> wholesale hand-off of the container to a different subsystem, for
> example vs an iommufd shim spoofing type1 support.

Well, I will agree to do everything required, just let's go through the
process to understand the security situation and ensure we are still
doing things the right way.

> Unfortunately we no longer have a CONFIG_EXPERIMENTAL option to hide
> behind for disabling VFIO_CONTAINER, so regardless of our intentions
> that a transition is some time off, it may become an issue sooner than
> we expect.

We can add kconfig text discouraging that?

diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c
index dbef3274803336..81f7594cfff8e0 100644
--- a/drivers/iommu/iommufd/vfio_compat.c
+++ b/drivers/iommu/iommufd/vfio_compat.c
@@ -259,6 +259,14 @@ static int iommufd_vfio_set_iommu(struct iommufd_ctx *ictx, unsigned long type)
 	struct iommufd_ioas *ioas = NULL;
 	int rc = 0;
 
+	/*
+	 * Emulation for NOIMMU is imperfect in that VFIO blocks almost all
+	 * other ioctls. We let them keep working but they mostly fail since no
+	 * IOAS should exist.
+	 */
+	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type == VFIO_NOIOMMU_IOMMU)
+		return 0;
+
 	if (type != VFIO_TYPE1_IOMMU && type != VFIO_TYPE1v2_IOMMU)
 		return -EINVAL;
 
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 595c7b2146f88c..64a336ebc99b9f 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -18,6 +18,10 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
 
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
+	    vdev->group->type == VFIO_NO_IOMMU)
+		return 0;
+
 	/*
 	 * If the driver doesn't provide this op then it means the device does
 	 * not do DMA at all. So nothing to do.
@@ -53,6 +57,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
 {
 	lockdep_assert_held(&vdev->dev_set->lock);
 
+	if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
+	    vdev->group->type == VFIO_NO_IOMMU)
+		return;
+
 	if (vdev->ops->unbind_iommufd)
 		vdev->ops->unbind_iommufd(vdev);
 }
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f3c48b8c45627d..08c5b05a129c2c 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -749,10 +749,13 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
 	if (!IS_ERR(iommufd)) {
 		u32 ioas_id;
 
-		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
-		if (ret) {
-			iommufd_ctx_put(group->iommufd);
-			goto out_unlock;
+		if (!IS_ENABLED(CONFIG_VFIO_NO_IOMMU) ||
+		    group->type != VFIO_NO_IOMMU) {
+			ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
+			if (ret) {
+				iommufd_ctx_put(group->iommufd);
+				goto out_unlock;
+			}
 		}
 
 		group->iommufd = iommufd;


More information about the dri-devel mailing list