[virglrenderer-devel] coherent memory access for virgl

Thu Oct 11 10:38:32 UTC 2018

  Hi,

> > In case of coherent buffers the bo will be mapped directly and the unmap
> > call is not needed to make the changes visible to the gpu.
> >
> > Correct?
> 
> Yes, though gbm doesn't have a way to signal coherency.  Maybe we
> could add gbm_bo_is_coherent(bo) on the host.

Hmm, how does mesa create coherent buffers if that isn't exposed by
libgbm?

> > So, yes, we could create a gbm_bo_map/unmap like interface at virtio
> > protocol level, so the guest would ...
> >
> >    (1) MAP
> >    (2) write to mapping
> >    (3) UNMAP
> 
> Yes.
> 
> > .. instead of ...
> >
> >    (1) ATTACH_BACKING
> >    (2) TRANSFER_TO_HOST
> >    (3) DETACH_BACKING
> >
> > I think the guest doesn't need to know which modifiers are used on the
> > host side then, because the host-side gbm_bo_map/gbm_bo_unmap calls will
> > tile/detile/compress/uncompress so it'll be transparent to the guest.
> 
> Depends on what guest userspace does -- if
> gbm_bo_create_with_modifiers is called and the wayland guest proxy
> needs modifiers, then we'll need to know modifiers.

The guest can create a linear resource, and it will if the virtio-gpu
ksm driver doesn't advertise modifiers (beside LINEAR), even if the
guest calls gbm_bo_create_with_modifiers(), right?

That doesn't prevent the host from using modifiers for the bo's
nevertheless, correct?  That will of course need support for modifiers
in qemu, so a scanout resource with modifiers will be displayed
correctly.

But I still don't see why the guest needs to know the modifiers.

> > Alternatively we could map the tiled/compressed bo as-is into the guest,
> > then have gbm_bo_map/gbm_bo_unmap calls in the guest handle the
> > tile/detile/compress/uncompress.  Is it possible in the first place to
> > map the raw bo on all hardware?  Would that allow to skip the roundtrip
> > to the host for map/unmap?
> 
> The gbm backend on the guest on is virgl.
> 
> https://cgit.freedesktop.org/mesa/mesa/tree/src/gbm/backends/dri/gbm_dri.c#n1268
> https://cgit.freedesktop.org/mesa/mesa/tree/include/GL/internal/dri_interface.h#n1588
> https://cgit.freedesktop.org/mesa/mesa/tree/src/gallium/state_trackers/dri/dri2.c#n1651
> 
> On the host, it'll be the host DRI backend.  Unless we can somehow get
> the host logic in the guest, we can't avoid the roundtrip for
> non-coherent buffers.

Documentation on modifiers isn't that good, the google finds me some
articles explaining the need for them but not the inner workings.

So, I've waded through the intel drivers's source code.  Seems Intel
hardware handles this via GTT (as I understand it GGTs are gpu page
tables, which appearently can't only map objects but also do various
conversions like tiling).

That is not something we can let the guest handle.  So,
tile/detile/compress/uncompress will be handled by map/unmap on the
host, anything else isn't going to fly.

> > > We can add even more flags to DRM_VIRTGPU_RESOURCE_INFO (i.e,
> > > TRANSFER_STRIDE_DIFFERENT) since for most host buffers map_stride ==
> > > compressed_stride, so we can avoid vm-exits associated with
> > > TRANSFER_FROM_HOST / TRANSFER_TO_HOST when only need to mmap().  Or we
> > > can extend the protocol.
> >
> > Hmm, that assumes we have the guest's gbm_bo_map/gbm_bo_unmap handle
> > tile/detile/compress/uncompress, correct?
> 
> We will.  The flow is:
> 
> 1) Guest queries guest EGL, gets modifier.  wayland guest proxy
> somehow gets modifier from host KMS.
> 2) gbm_bo_create_with_modifiers -- we'll be using the virgl DRI
> interface, which will call essentially VIRTGPU_RESOURCE_CREATE2 (which
> passes down modifiers).
> 3) gbm_bo_create_with_modifiers on the host.

Hmm, that also doesn't answer the question why the guest needs to know
the modifiers.  We can let the host query the supported modifiers and
call gbm_bo_create_with_modifiers() without the guest knowing the list
of supported modifiers.

Of course we need to keep track on the host side which resource uses
which modifier, ...

> 4) guest gbm buffer is imported to guest 3D driver, which is backed by
> a virglrenderer resource (which has been imported in host GL)
> 5) Texturing/Rendering ensues.
> 6) If the guest needs to read the contents of the buffer, we can do a
> gbm_bo_map on the host and put this into the PCI bar.
> 7) wayland guest proxy sends buffer to display, host proxy actually displays

... so the host proxy can lookup the modifier used when passing on the
buffer to the host compositor.

Also on, on (6):  I'm not convinced yet that letting the guest access
the gbm_bo_map() mapping directly via pci bar is actually a win.

Reason is that we have quite some overhead to establish the mapping:

  (1) gbm_bo_map() on the host
  (2) qemu updating the guest address space, host kvm updating ept tables.
  (3) guest kernel mapping it into guest userspace.

The same dance in reverse order when tearing down the mapping.  And this
is not persistent, we'll have to do that every single time the guest
wants cpu access to the resource.

Except for coherent buffers of course, where we can establish this
mapping once, then run with it as long as the resource exists.

cheers,
  Gerd