[virglrenderer-devel] coherent memory access for virgl

Fri Oct 12 08:03:52 UTC 2018

  Hi,

> > Hmm, how does mesa create coherent buffers if that isn't exposed by
> > libgbm?
> 
> Usually, drivers have some coherent pool of memory to allocate from.
> For example, most of our Intel devices have a HW mechanism (last level
> cache -- see I915_PARAM_HAS_LLC) that ensures GPU/CPU coherency:
> 
> https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/drivers/dri/i965/brw_bufmgr.c#n640
> 
> gbm doesn't expose this at the API level, but any driver that exposes
> the Vulkan / GL coherent bits should have a mechanism for this.

Ah, so there is a driver-specific side path not using the gbm API.

> > That doesn't prevent the host from using modifiers for the bo's
> > nevertheless, correct?  That will of course need support for modifiers
> > in qemu, so a scanout resource with modifiers will be displayed
> > correctly.
> >
> > But I still don't see why the guest needs to know the modifiers.
> 
> It depends on where you want to inject KMS knowledge.  Your approach
> for allocating host-optimized external memory would be:
> 
> 1) gbm_bo_create() [using flags] on the guest
> 2) gbm_bo_create_with_modifiers() on the host.  We can convert
> GBM_BO_USE_RENDERING to the list of modifiers
>     supported by EGL, and GBM_BO_USE_SCANOUT to the list of modifiers
> supported by KMS.

Yep, that looks much better to me.

I think right now we don't pass any usage hints for resources
(GBM_BO_USE_*) from guest to host, so that must be added.

We'll need some compatibility fluff in virglrenderer:  Using modifiers
must be off by default, and we need some switch to turn it on, for
qemu (and other vmm) versions which are new enough that they can handle
modifiers.

We need a struct virgl_renderer_resource_info2 which includes the
modifier used for the given resource.

> We'll have to do something like this for Android since userspace has
> no concept of modifiers.  That means host wayland will have to talk to
> virglrenderer.

Well, I think for wayland interfacing it doesn't change much.  We have
to pass the dmabuf metadata (fourcc, stride, etc.) anyway, and the
modifier used is just another parameter in that list ...

> I'm fine with both approaches, as long as we allocate the optimal
> buffer for a given scenario.

Is there anything else (beside GBM_BO_USE_*) the host should know about
a resource to pick the best modifier?

> > > 6) If the guest needs to read the contents of the buffer, we can do a
> > > gbm_bo_map on the host and put this into the PCI bar.

> > Also on, on (6):  I'm not convinced yet that letting the guest access
> > the gbm_bo_map() mapping directly via pci bar is actually a win.
> 
> The main added use case for the PCI bar (besides coherent buffers) is
> actually 1D GL buffers.

i.e. data.  vertex arrays and simliar I guess.
Filled by the guest (i.e. transfer_to_host) I assume?

> We'll have usage hints (PIPE_USAGE_STREAM / PIPE_USAGE_STAGING /
> PIPE_USAGE_DYNAMIC / PIPE_USAGE_DEFAULT -- unfortunately guest Mesa
> doesn't pass them down yet). For GL_STATIC_DRAW, it would be a
> definite win.

Hmm, why?  Especially when filling once, then use often it shouldn't
matter that much how you transfer the data when filling the resource.

I'd expect the bigger the buffer is the higher the chance that it'll be
an actual win despite the mapping overhead.  So, a handfull of pages
with vertex data -- probably not.  full hd yuv image -- much more
likely.  Guess that needs some benchmarks to figure once we have
something to play with.

One more question:  gbm_bo_map() allows to map only a region of the bo
instead of the whole thing.  Is that important?

> Host RGBA external textures / render targets -- the secondary possible
> use case of the PCI -- are actually rarely mapped by guest user-space.
> Android, for example, *very* rarely maps any buffers created with
> usage bits
> AHARDWAREBUFFER_USAGE_GPU_COLOR_OUTPUT |
> AHARDWAREBUFFER_USAGE_GPU_SAMPLED_IMAGE (similar case with Chrome
> ozone-drm, and I'm pretty sure vanilla Linux).

Yep, typically you don't want read it back but show it on the display.

> > Reason is that we have quite some overhead to establish the mapping:
> >
> >   (1) gbm_bo_map() on the host
> >   (2) qemu updating the guest address space, host kvm updating ept tables.
> >   (3) guest kernel mapping it into guest userspace.
> 
> What's the QEMU function that puts host memory into the guest PCI
> configuration address space?  Is it just KVM_SET_USER_MEMORY_REGION?

Yes, that is what it boils down to when qemu finished processing the
memory region update.

cheers,
  Gerd