[Intel-gfx] [PATCH 1/9] vfio: Make vfio_(un)register_notifier accept a vfio_device

Wed Apr 13 23:12:15 UTC 2022

On Wed, Apr 13, 2022 at 09:08:40PM +0000, Wang, Zhi A wrote:
> On 4/13/22 8:04 PM, Jason Gunthorpe wrote:
> > On Wed, Apr 13, 2022 at 07:17:52PM +0000, Wang, Zhi A wrote:
> >> On 4/13/22 5:37 PM, Jason Gunthorpe wrote:
> >>> On Wed, Apr 13, 2022 at 06:29:46PM +0200, Christoph Hellwig wrote:
> >>>> On Wed, Apr 13, 2022 at 01:18:14PM -0300, Jason Gunthorpe wrote:
> >>>>> Yeah, I was thinking about that too, but on the other hand I think it
> >>>>> is completely wrong that gvt requires kvm at all. A vfio_device is not
> >>>>> supposed to be tightly linked to KVM - the only exception possibly
> >>>>> being s390..
> >>>>
> >>>> So i915/gvt uses it for:
> >>>>
> >>>>  - poking into the KVM GFN translations
> >>>>  - using the KVM page track notifier
> >>>>
> >>>> No idea how these could be solved in a more generic way.
> >>>
> >>> TBH I'm not sure how any of this works fully correctly..
> >>>
> >>> I see this code getting something it calls a GFN and then passing
> >>> them to vfio - which makes no sense. Either a value is a GFN - the
> >>> physical memory address of the VM, or it is an IOVA. VFIO only takes
> >>> in IOVA and kvm only takes in GFN. So these are probably IOVAs really..
> >>>
> >> Can you let me know the place? So that I can take a look.
> > 
> > Well, for instance:
> > 
> > static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> > 		unsigned long size, struct page **page)
> > 
> > There is no way that is a GFN, it is an IOVA.
> > 
> I see. The name is vague. There is an promised 1:1 mapping between guest GFN
> and host IOVA when a PCI device is passed to a VM, I guess mdev is just
> leveraging it as they are sharing the same code path in QEMU.

That has never been true. It happens to be the case in some common scenarios.

> > So if the page table in the guest has IOVA addreses then why can you
> > use them as GFNs?
> 
> That's another problem. We don't support a guess enabling the guest IOMMU
> (aka virtual IOMMU). The guest/virtual IOMMU is implemented in QEMU, so
> does the translation between guest IOVA and GFN. For a mdev model
> implemented in the kernel, there isn't any mechanism so far to reach there.

And this is the uncommon scenario, there is no way for the mdev driver
to know if viommu is turned on, and AFAIK, no way to block it from VFIO.

> People were discussing it before. But none agreement was achieved. Is it
> possible to implement it in the kernel? Would like to discuss more about it
> if there is any good idea.

I don't know of anything, VFIO and kvm are not intended to be tightly
linked like this, they don't have the same view of the world.

Jason