[PATCH v2 10/11] vfio: Make vfio_container optionally compiled

Thu Nov 10 06:57:57 UTC 2022

> From: Jason Gunthorpe <jgg at nvidia.com>
> Sent: Thursday, November 10, 2022 3:53 AM
> 
> On Wed, Nov 09, 2022 at 10:18:09AM -0700, Alex Williamson wrote:
> 
> > DPDK supports no-iommu mode.
> 
> Er? Huh? How? I thought no-iommu was for applications that didn't do
> DMA? How is DPDK getting packets in/out without DMA? I guess it snoops
> in /proc/ or something to learn PFNs of mlock'd memory? <shudder>

iirc dpdk started with UIO plus various tricks (root privilege, hugepage, etc.)
to lock and learn PFN's from pagemap. Then when migrating it to vfio the
no-iommu option was introduced to provide UIO compatibility.

> 
> > I agree that it's very useful for testing, I'm certainly not suggesting
> > to give up, but I'm not sure where no-iommu lives when iommufd owns
> > /dev/vfio/vfio.  Given the unsafe interrupts discussion, it doesn't
> > seem like the type of thing that would be a priority for iommufd.
> 
> Ah, I think the bit below will do the job, I'm not sure without doing
> some testing (and I don't think I have the necessary Intel NIC for
> DPDK). The basic point is no-iommu literally means 'do not use iommufd
> at all, do not create an iommufd_device or an iommufd_access'. VFIO
> can easially do that on its own.
> 
> The only slightly messy bit is that uAPI requires the SET_CONTAINER
> which we can just NOP in vfio_compat. With more checks it could have
> higher fidelity of error cases, but not sure it is worth it.
> 
> When we talk about the device cdev flow then I suggest that no-iommu
> simply requires -1 for the iommufd during bind - ie no iommufd is
> used or accepted and that is how the userspace knows/confirms it is in
> no-iommu mode.
> 
> > We're on a path where vfio accepts an iommufd as a container, and
> > ultimately iommufd becomes the container provider, supplanting the
> > IOMMU driver registration aspect of vfio.  I absolutely want type1 and
> > spapr backends to get replaced by iommufd, but reluctance to support
> > aspects of vfio "legacy" behavior doesn't give me warm fuzzies about a
> > wholesale hand-off of the container to a different subsystem, for
> > example vs an iommufd shim spoofing type1 support.
> 
> Well, I will agree to do everything required, just let's go through the
> process to understand the security situation and ensure we are still
> doing things the right way.
> 
> > Unfortunately we no longer have a CONFIG_EXPERIMENTAL option to hide
> > behind for disabling VFIO_CONTAINER, so regardless of our intentions
> > that a transition is some time off, it may become an issue sooner than
> > we expect.
> 
> We can add kconfig text discouraging that?
> 
> diff --git a/drivers/iommu/iommufd/vfio_compat.c
> b/drivers/iommu/iommufd/vfio_compat.c
> index dbef3274803336..81f7594cfff8e0 100644
> --- a/drivers/iommu/iommufd/vfio_compat.c
> +++ b/drivers/iommu/iommufd/vfio_compat.c
> @@ -259,6 +259,14 @@ static int iommufd_vfio_set_iommu(struct
> iommufd_ctx *ictx, unsigned long type)
>  	struct iommufd_ioas *ioas = NULL;
>  	int rc = 0;
> 
> +	/*
> +	 * Emulation for NOIMMU is imperfect in that VFIO blocks almost all
> +	 * other ioctls. We let them keep working but they mostly fail since no
> +	 * IOAS should exist.
> +	 */
> +	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type ==
> VFIO_NOIOMMU_IOMMU)
> +		return 0;
> +
>  	if (type != VFIO_TYPE1_IOMMU && type != VFIO_TYPE1v2_IOMMU)
>  		return -EINVAL;
> 

also need a check in iommufd_vfio_check_extension() so only
VFIO_NOIOMMU_IOMMU is supported in no-iommu mode.

> diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
> index 595c7b2146f88c..64a336ebc99b9f 100644
> --- a/drivers/vfio/iommufd.c
> +++ b/drivers/vfio/iommufd.c
> @@ -18,6 +18,10 @@ int vfio_iommufd_bind(struct vfio_device *vdev,
> struct iommufd_ctx *ictx)
> 
>  	lockdep_assert_held(&vdev->dev_set->lock);
> 
> +	if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
> +	    vdev->group->type == VFIO_NO_IOMMU)
> +		return 0;
> +
>  	/*
>  	 * If the driver doesn't provide this op then it means the device does
>  	 * not do DMA at all. So nothing to do.
> @@ -53,6 +57,10 @@ void vfio_iommufd_unbind(struct vfio_device *vdev)
>  {
>  	lockdep_assert_held(&vdev->dev_set->lock);
> 
> +	if (IS_ENABLED(CONFIG_VFIO_NO_IOMMU) &&
> +	    vdev->group->type == VFIO_NO_IOMMU)
> +		return;
> +
>  	if (vdev->ops->unbind_iommufd)
>  		vdev->ops->unbind_iommufd(vdev);
>  }
> diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
> index f3c48b8c45627d..08c5b05a129c2c 100644
> --- a/drivers/vfio/vfio_main.c
> +++ b/drivers/vfio/vfio_main.c
> @@ -749,10 +749,13 @@ static int vfio_group_ioctl_set_container(struct
> vfio_group *group,
>  	if (!IS_ERR(iommufd)) {
>  		u32 ioas_id;
> 
> -		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
> -		if (ret) {
> -			iommufd_ctx_put(group->iommufd);
> -			goto out_unlock;
> +		if (!IS_ENABLED(CONFIG_VFIO_NO_IOMMU) ||
> +		    group->type != VFIO_NO_IOMMU) {
> +			ret = iommufd_vfio_compat_ioas_id(iommufd,
> &ioas_id);
> +			if (ret) {
> +				iommufd_ctx_put(group->iommufd);
> +				goto out_unlock;
> +			}
>  		}

with above I suppose other ioctls (map/unmap/etc.) are implicitly blocked
given get_compat_ioas() will fail in those paths. this is good.

btw vfio container requires exact match between group->type and
container->noiommu, i.e. noiommu group can be only attached to noiommu
container. this is another thing to be paired up.