[Intel-gfx] [PATCH v2 10/11] vfio: Make vfio_container optionally compiled
Jason Gunthorpe
jgg at nvidia.com
Wed Nov 9 19:52:55 UTC 2022
On Wed, Nov 09, 2022 at 10:18:09AM -0700, Alex Williamson wrote:
> DPDK supports no-iommu mode.
Er? Huh? How? I thought no-iommu was for applications that didn't do
DMA? How is DPDK getting packets in/out without DMA? I guess it snoops
in /proc/ or something to learn PFNs of mlock'd memory? <shudder>
> I agree that it's very useful for testing, I'm certainly not suggesting
> to give up, but I'm not sure where no-iommu lives when iommufd owns
> /dev/vfio/vfio. Given the unsafe interrupts discussion, it doesn't
> seem like the type of thing that would be a priority for iommufd.
Ah, I think the bit below will do the job, I'm not sure without doing
some testing (and I don't think I have the necessary Intel NIC for
DPDK). The basic point is no-iommu literally means 'do not use iommufd
at all, do not create an iommufd_device or an iommufd_access'. VFIO
can easially do that on its own.
The only slightly messy bit is that uAPI requires the SET_CONTAINER
which we can just NOP in vfio_compat. With more checks it could have
higher fidelity of error cases, but not sure it is worth it.
When we talk about the device cdev flow then I suggest that no-iommu
simply requires -1 for the iommufd during bind - ie no iommufd is
used or accepted and that is how the userspace knows/confirms it is in
no-iommu mode.
> We're on a path where vfio accepts an iommufd as a container, and
> ultimately iommufd becomes the container provider, supplanting the
> IOMMU driver registration aspect of vfio. I absolutely want type1 and
> spapr backends to get replaced by iommufd, but reluctance to support
> aspects of vfio "legacy" behavior doesn't give me warm fuzzies about a
> wholesale hand-off of the container to a different subsystem, for
> example vs an iommufd shim spoofing type1 support.
Well, I will agree to do everything required, just let's go through the
process to understand the security situation and ensure we are still
doing things the right way.
> Unfortunately we no longer have a CONFIG_EXPERIMENTAL option to hide
> behind for disabling VFIO_CONTAINER, so regardless of our intentions
> that a transition is some time off, it may become an issue sooner than
> we expect.
We can add kconfig text discouraging that?
diff --git a/drivers/iommu/iommufd/vfio_compat.c b/drivers/iommu/iommufd/vfio_compat.c
index dbef3274803336..81f7594cfff8e0 100644
--- a/drivers/iommu/iommufd/vfio_compat.c
+++ b/drivers/iommu/iommufd/vfio_compat.c
@@ -259,6 +259,14 @@ static int iommufd_vfio_set_iommu(struct iommufd_ctx *ictx, unsigned long type)
struct iommufd_ioas *ioas = NULL;
int rc = 0;
+ /*
+ * Emulation for NOIMMU is imperfect in that VFIO blocks almost all
+ * other ioctls. We let them keep working but they mostly fail since no
+ * IOAS should exist.
+ */
+ if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type == VFIO_NOIOMMU_IOMMU)
+ return 0;
+
if (type != VFIO_TYPE1_IOMMU && type != VFIO_TYPE1v2_IOMMU)
return -EINVAL;
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 595c7b2146f88c..64a336ebc99b9f 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -18,6 +18,10 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
lockdep_assert_held(&vdev->dev_set->lock);
+ if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
+ vdev->group->type == VFIO_NO_IOMMU)
+ return 0;
+
/*
* If the driver doesn't provide this op then it means the device does
* not do DMA at all. So nothing to do.
@@ -53,6 +57,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
{
lockdep_assert_held(&vdev->dev_set->lock);
+ if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
+ vdev->group->type == VFIO_NO_IOMMU)
+ return;
+
if (vdev->ops->unbind_iommufd)
vdev->ops->unbind_iommufd(vdev);
}
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index f3c48b8c45627d..08c5b05a129c2c 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -749,10 +749,13 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
if (!IS_ERR(iommufd)) {
u32 ioas_id;
- ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
- if (ret) {
- iommufd_ctx_put(group->iommufd);
- goto out_unlock;
+ if (!IS_ENABLED(CONFIG_VFIO_NO_IOMMU) ||
+ group->type != VFIO_NO_IOMMU) {
+ ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
+ if (ret) {
+ iommufd_ctx_put(group->iommufd);
+ goto out_unlock;
+ }
}
group->iommufd = iommufd;
More information about the Intel-gfx
mailing list