[RFC PATCH 08/12] vfio/pci: Create host unaccessible dma-buf for private device

Mon Jan 20 04:53:22 UTC 2025

On Fri, Jan 17, 2025 at 09:25:23AM -0400, Jason Gunthorpe wrote:
> On Fri, Jan 17, 2025 at 09:57:40AM +0800, Baolu Lu wrote:
> > On 1/15/25 21:01, Jason Gunthorpe wrote:
> > > On Wed, Jan 15, 2025 at 11:57:05PM +1100, Alexey Kardashevskiy wrote:
> > > > On 15/1/25 00:35, Jason Gunthorpe wrote:
> > > > > On Tue, Jun 18, 2024 at 07:28:43AM +0800, Xu Yilun wrote:
> > > > > 
> > > > > > > is needed so the secure world can prepare anything it needs prior to
> > > > > > > starting the VM.
> > > > > > OK. From Dan's patchset there are some touch point for vendor tsm
> > > > > > drivers to do secure world preparation. e.g. pci_tsm_ops::probe().
> > > > > > 
> > > > > > Maybe we could move to Dan's thread for discussion.
> > > > > > 
> > > > > > https://lore.kernel.org/linux-
> > > > > > coco/173343739517.1074769.13134786548545925484.stgit at dwillia2-
> > > > > > xfh.jf.intel.com/
> > > > > I think Dan's series is different, any uapi from that series should
> > > > > not be used in the VMM case. We need proper vfio APIs for the VMM to
> > > > > use. I would expect VFIO to be calling some of that infrastructure.
> > > > Something like this experiment?
> > > > 
> > > > https://github.com/aik/linux/commit/
> > > > ce052512fb8784e19745d4cb222e23cabc57792e
> > > Yeah, maybe, though I don't know which of vfio/iommufd/kvm should be
> > > hosting those APIs, the above does seem to be a reasonable direction.
> > > 
> > > When the various fds are closed I would expect the kernel to unbind
> > > and restore the device back.
> > 
> > I am curious about the value of tsm binding against an iomnufd_vdevice
> > instead of the physical iommufd_device.
> 
> Interesting question
>  
> > It is likely that the kvm pointer should be passed to iommufd during the
> > creation of a viommu object. 
> 
> Yes, I fully expect this
> 
> > If my recollection is correct, the arm
> > smmu-v3 needs it to obtain the vmid to setup the userspace event queue:
> 
> Right now it will use a VMID unrelated to KVM. BTM support on ARM will
> require syncing the VMID with KVM.
> 
> AMD and Intel may require the KVM for some reason as well.
> 
> For CC I'm expecting the KVM fd to be the handle for the cVM, so any
> RPCs that want to call into the secure world need the KVM FD to get
> the cVM's identifier. Ie a "bind to cVM" RPC will need the PCI
> information and the cVM's handle.

I also expect this.

> 
> From that perspective it does make sense that any cVM related APIs,
> like "bind to cVM" would be against the VDEVICE where we have a link
> to the VIOMMU which has the KVM. On the iommufd side the VIOMMU is
> part of the object hierarchy, but does not necessarily have to force a
> vIOMMU to appear in the cVM.
> 
> But it also seems to me that VFIO should be able to support putting
> the device into the RUN state

Firstly I think VFIO should support putting device into *LOCKED* state.
>From LOCKED to RUN, there are many evidence fetching and attestation
things that only guest cares. I don't think VFIO needs to opt-in.

But that doesn't impact this concern. I actually think VFIO should
provide 'bind' uAPI to support these device side configuration things
rather than iommufd uAPI. IIUC iommufd should only do the setup on
IOMMU side.

The switching of TDISP state to LOCKED involves device side
differences that should be awared by the device owner, VFIO driver.

E.g. as we previously mentioned, to check if all MMIOs are never mapped.

Another E.g. invalidate MMIOs when device is to be LOCKED, some Pseudo
Code:

@@ -1494,7 +1494,15 @@ static int vfio_pci_ioctl_tsm_bind(struct vfio_pci_core_device *vdev,
        if (!kvm)
                return -ENOENT;

+       down_write(&vdev->memory_lock);
+       vfio_pci_dma_buf_move(vdev, true);
+
        ret = pci_tsm_dev_bind(pdev, kvm, &bind.intf_id);
+
+       if (__vfio_pci_memory_enabled(vdev))
+               vfio_pci_dma_buf_move(vdev, false);
+       up_write(&vdev->memory_lock);


BTW, we may still need viommu/vdevice APIs during 'bind', if some IOMMU
side configurations are required by secure world. TDX does have some.

> without involving KVM or cVMs.

It may not be feasible for all vendors. I believe AMD would have one
firmware call that requires cVM handle *AND* move device into LOCKED
state. It really depends on firmware implementation.

So I'm expecting a coarse TSM verb pci_tsm_dev_bind() for vendors to do
any host side preparation and put device into LOCKED state.

> 
> > Intel TDX connect implementation also needs a reference to the kvm
> > pointer to obtain the secure EPT information. This is crucial because
> > the CPU's page table must be shared with the iommu. 
> 
> I thought kvm folks were NAKing this sharing entirely? Or is the

I believe this is still Based on the general EPT sharing idea, is it?

There are several major reasons for the objection. In general, KVM now
has many "page non-present" tricks in EPT, which are not applicable to
IOPT. If shared, KVM has to take IOPT concerns into account, which is
quite a burden for KVM maintaining.

> secure EPT in the secure world and not directly managed by Linux?

Yes, the secure EPT is in the secure world and managed by TDX firmware.
Now a SW Mirror Secure EPT is introduced in KVM and managed by KVM
directly, and KVM will finally use firmware calls to propagate Mirror
Secure EPT changes to secure EPT.

Secure EPT are controlled by TDX module, basically KVM cannot play any
of the tricks. And TDX firmware should ensure any SEPT setting would be
applicable for Secure IOPT. I hope this could remove most of the
concerns.

I remember we've talked about SEPT sharing architechture for TDX TIO
before, but didn't get information back from KVM folks. Not sure how
things will go. Maybe will find out when we have some patches posted.

Thanks,
Yilun

> 
> AFAIK AMD is going to mirror the iommu page table like today.
> 
> ARM, I suspect, will not have an "EPT" under Linux control, so
> whatever happens will be hidden in their secure world.
> 
> Jason