[virglrenderer-devel] coherent memory access for virgl

Tue Oct 16 06:22:13 UTC 2018

  Hi,

> Okay, so that method can't be used to back objects created with
> DRM_VIRTGPU_RESOURCE_CREATE.  I want to back all multi-level textures
> with memfd pages, so we can use udmabuf and eliminate a copy.
> 
> Is it possible to add another TTM region in kernel driver backed by
> memfd pages?  Instead of initiating the region with TTM_PL_VRAM (like
> done in [1]), we could initiate it with TTM_PL_FLAG_PRIV.

Not sure it is a good idea to hack driver and device (and the API) to
workaround limitations in croswm ...

> If it is possible, is there any drawback with that method compared to
> backing all guest RAM with memfd?

Isn't it easier to complete memfd support in crosvm?

> > I wouldn't worry too much about this.  One advantage we have when the
> > guest is not involved in picking the modifiers is that we can
> > re-allocate the resource (unless it has an active mapping) with other
> > modifiers.  Also useful when switching qemu between fullscreen/windowed
> > (as discussed elsewhere in the thread).
> 
> The guest needs to be notified about buffer size changes, no?  It
> needs to redraw given the larger buffer size.

Ah, right, it'll re-create the buffer anyway.  But the problem that the
list of supported modifiers is fixed (so we can't hint to the guest that
fullscreen/windowed cases should use different modifiers) goes away if
the host picks the modifiers (instead of the guest).

> > Looking at the intel driver I see that it always maps the complete bo.
> > Doing the same for virtio would simplify things alot.
> >
> > Related:  The intel driver also never unmaps.  It keeps the mappings
> > cached until the bo is destroyed.   So we could do the same in virtio,
> > to reduce the mapping overhead.  But probably not on all hardware.
> > I suspect that implementation detail is not exposed at gbm API level?
> 
> Yes, it's not exposed at the API level and no GEM drivers offer
> partial mappings.  The box is mainly for not copying the entire region
> when using a shadow buffer.

Yes, that makes sense.

> In minigbm, we tend to keep mappings around as well (though not in all cases):
> 
> https://chromium.googlesource.com/chromiumos/platform/minigbm/+/master/drv.c#525
> 
> It wouldn't be too much work to add an explicit gbm_bo_flush(..) or
> gbm_bo_invalidate(...) (especially since some drivers have explicit
> ioctls, like DRM_MSM_GEM_CPU_PREP / DRM_MSM_GEM_CPU_FINI) in minigbm
> for suitable drivers (most of them, since it's most use
> uncached/write-combine system memory).

I suspect that'll be a bit more work ...

Basically this is changing the driver-level api.  gbm_bo_map() does map
+ start-transfer now, and likewise gbm_bo_unmap is finish-transfer +
unmap().  We would split those, so you can do multiple transfers on the
same mapping, so ...

    gbm_bo_map(gbm_bo_transfer_flags, box)
    /* copy data */
    gbm_bo_unmap()

... would become ...

    gbm_bo_map2()
    gbm_bo_start(gbm_bo_transfer_flags, box)
    /* copy data */
    gbm_bo_finish()
    gbm_bo_unmap2()

... which then allows ...

    gbm_bo_map2()
    gbm_bo_start(gbm_bo_transfer_flags, box)
    /* copy data */
    gbm_bo_finish()
    gbm_bo_start(gbm_bo_transfer_flags, box)
    /* copy data again */
    gbm_bo_finish()
    gbm_bo_unmap2()

I'm wondering whenever this actually is a typical access pattern though.
I'd expect people will try use coherent buffers if they need something
like this.  Or, in other words, I'm trying to figure what kind of
buffers we should support in virtio.

Right now we have just one kind of buffer, which is actually two
buffers.  One in the host, one in the guest.  TRANSFER will copy between
them.  Uses TTM_PL_TT.

My patches add a ZEROCOPY type (not sure any more this is a good name).
That is a shared buffer.  Guest allocates.  Host uses udmabuf to access
it without copying.  Uses TTM_PL_TT.

Also there is COHERENT type (likewise should pick a better name).
Shared buffer too.  Host allocates.  Gets mapped into the pci bar to
give the guest direct access.  We can permanently map them in the guest.
Uses TTM_PL_VRAM.

Adding these two looks straightforward.

Using the pci bar also for non-coherent buffers becomes tricky.  The
buffer is out-of-reach for the guest.  We can ask the host gbm_bo_map()
the buffer and place it in the pci bar for access.  But it's not
permanent (unless we can change libgbm as outlined above).  I'm not sure
how to model that best in ttm ...

cheers,
  Gerd