[Intel-gfx] [Announcement] 2015-Q3 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

Wed Nov 18 10:12:21 PST 2015

[cc +qemu-devel, +paolo, +gerd]

On Tue, 2015-10-27 at 17:25 +0800, Jike Song wrote:
> Hi all,
> 
> We are pleased to announce another update of Intel GVT-g for Xen.
> 
> Intel GVT-g is a full GPU virtualization solution with mediated
> pass-through, starting from 4th generation Intel Core(TM) processors
> with Intel Graphics processors. A virtual GPU instance is maintained
> for each VM, with part of performance critical resources directly
> assigned. The capability of running native graphics driver inside a
> VM, without hypervisor intervention in performance critical paths,
> achieves a good balance among performance, feature, and sharing
> capability. Xen is currently supported on Intel Processor Graphics
> (a.k.a. XenGT); and the core logic can be easily ported to other
> hypervisors.
> 
> 
> Repositories
> 
>      Kernel: https://github.com/01org/igvtg-kernel (2015q3-3.18.0 branch)
>      Xen: https://github.com/01org/igvtg-xen (2015q3-4.5 branch)
>      Qemu: https://github.com/01org/igvtg-qemu (xengt_public2015q3 branch)
> 
> 
> This update consists of:
> 
>      - XenGT is now merged with KVMGT in unified repositories(kernel and qemu), but currently
>        different branches for qemu.  XenGT and KVMGT share same iGVT-g core logic.

Hi!

At redhat we've been thinking about how to support vGPUs from multiple
vendors in a common way within QEMU.  We want to enable code sharing
between vendors and give new vendors an easy path to add their own
support.  We also have the complication that not all vGPU vendors are as
open source friendly as Intel, so being able to abstract the device
mediation and access outside of QEMU is a big advantage.

The proposal I'd like to make is that a vGPU, whether it is from Intel
or another vendor, is predominantly a PCI(e) device.  We have an
interface in QEMU already for exposing arbitrary PCI devices, vfio-pci.
Currently vfio-pci uses the VFIO API to interact with "physical" devices
and system IOMMUs.  I highlight /physical/ there because some of these
physical devices are SR-IOV VFs, which is somewhat of a fuzzy concept,
somewhere between fixed hardware and a virtual device implemented in
software.  That software just happens to be running on the physical
endpoint.

vGPUs are similar, with the virtual device created at a different point,
host software.  They also rely on different IOMMU constructs, making use
of the MMU capabilities of the GPU (GTTs and such), but really having
similar requirements.

The proposal is therefore that GPU vendors can expose vGPUs to
userspace, and thus to QEMU, using the VFIO API.  For instance, vfio
supports modular bus drivers and IOMMU drivers.  An intel-vfio-gvt-d
module (or extension of i915) can register as a vfio bus driver, create
a struct device per vGPU, create an IOMMU group for that device, and
register that device with the vfio-core.  Since we don't rely on the
system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or
extension of the same module) can register a "type1" compliant IOMMU
driver into vfio-core.  From the perspective of QEMU then, all of the
existing vfio-pci code is re-used, QEMU remains largely unaware of any
specifics of the vGPU being assigned, and the only necessary change so
far is how QEMU traverses sysfs to find the device and thus the IOMMU
group leading to the vfio group.

There are a few areas where we know we'll need to extend the VFIO API to
make this work, but it seems like they can all be done generically.  One
is that PCI BARs are described through the VFIO API as regions and each
region has a single flag describing whether mmap (ie. direct mapping) of
that region is possible.  We expect that vGPUs likely need finer
granularity, enabling some areas within a BAR to be trapped and fowarded
as a read or write access for the vGPU-vfio-device module to emulate,
while other regions, like framebuffers or texture regions, are directly
mapped.  I have prototype code to enable this already.

Another area is that we really don't want to proliferate each vGPU
needing a new IOMMU type within vfio.  The existing type1 IOMMU provides
potentially the most simple mapping and unmapping interface possible.
We'd therefore need to allow multiple "type1" IOMMU drivers for vfio,
making type1 be more of an interface specification rather than a single
implementation.  This is a trivial change to make within vfio and one
that I believe is compatible with the existing API.  Note that
implementing a type1-compliant vfio IOMMU does not imply pinning an
mapping every registered page.  A vGPU, with mediated device access, may
use this only to track the current HVA to GPA mappings for a VM.  Only
when a DMA is enabled for the vGPU instance is that HVA pinned and an
HPA to GPA translation programmed into the GPU MMU.

Another area of extension is how to expose a framebuffer to QEMU for
seamless integration into a SPICE/VNC channel.  For this I believe we
could use a new region, much like we've done to expose VGA access
through a vfio device file descriptor.  An area within this new
framebuffer region could be directly mappable in QEMU while a
non-mappable page, at a standard location with standardized format,
provides a description of framebuffer and potentially even a
communication channel to synchronize framebuffer captures.  This would
be new code for QEMU, but something we could share among all vGPU
implementations.

Another obvious area to be standardized would be how to discover,
create, and destroy vGPU instances.  SR-IOV has a standard mechanism to
create VFs in sysfs and I would propose that vGPU vendors try to
standardize on similar interfaces to enable libvirt to easily discover
the vGPU capabilities of a given GPU and manage the lifecycle of a vGPU
instance.

This is obviously a lot to digest, but I'd certainly be interested in
hearing feedback on this proposal as well as try to clarify anything
I've left out or misrepresented above.  Another benefit to this
mechanism is that direct GPU assignment and vGPU assignment use the same
code within QEMU and same API to the kernel, which should make debugging
and code support between the two easier.  I'd really like to start a
discussion around this proposal, and of course the first open source
implementation of this sort of model will really help to drive the
direction it takes.  Thanks!

Alex