[virglrenderer-devel] coherent memory access for virgl

Thu Sep 27 00:00:14 UTC 2018

On Wed, Sep 26, 2018 at 3:35 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
>
>   Hi,
>
> > For host GL buffers, the copies are done in
> > {vrend_renderer_transfer_write_iov, vrend_renderer_transfer_send_iov}.
> > If there are N iovecs backing the guest resource, we will have N
> > copies (see vrend_write_to_iovec, vrend_read_from_iovec).
>
> Yes.
>
> > udmabuf could be helpful, since it bundles up the iovecs and it will
> > make the N small copies into one big copy.
>
> That isn't much of a win I guess, the amount of data copyed over will be
> the same.
>
> > udmabuf could also eliminate some copies for textures completely.
>
> Yes, that'll be more interesting of course.
>
> > Right now, for most
> > textures, virglrenderer copies iovecs into a temporary buffer (see
> > read_transfer_data), and then calls glTexSubImage2D*.
>
> Is virglrenderer clever enough to skip the temporary buffer copy in case
> it finds niov == 1 ?

There is a fast-path in read_transfer_data / write_transfer_data
depending on the send_size and various other parameters, but in my gdb
experience it's not used most of the time.

>
> > Just mmaping the udmabuf and calling glTexSubImage2D* is definite win.
>
> Can virglrenderer just import the dmabuf as texture instead of creating
> a new one and copying pixels with glTexSubImage2D?

Yes -- but only in a subset of cases, since external textures don't
work with various functions (TexImage2D, TexSubImage2D,
CopyTexImage2D) and don't support mip-maps (see
https://www.khronos.org/registry/OpenGL/extensions/OES/OES_EGL_image_external).

However, such cases are prominent in the Android / ChromeOS display
stacks (and often mapped in the guest), so that's why I'm interested
in making them backed by host memory and display/GPU optimized.  We'll
need a way of expressing modifiers to the guest, so this delves into
the earlier discussion of wayland host proxying.  Who should allocate
-- the host compositor, the VMM, virglrenderer?  The host compositor
seems like the most natural choice.

Are there any plans of the guest using host-optimized buffers
(communicated via modifiers) in a purely Linux guest?

>
> > But making host memory guest visible will bring the worst-case buffer
> > copies from 3 to 1.  For textures, if we start counting when the GPU
> > buffer gets detiled, there will be 5 copies currently, 3 with udmabuf,
> > and 1 with host exposed memory.
>
> 5 copies?  verbose please.  I can see three:

It depends on when you start counting.  I started counting from
vrend_renderer_transfer_send_iov, which includes fetching the data
from the host and packing that data into the iovecs.  Probably a worst
case scenario.

>
>   (1) glTexSubImage in guest.
>   (2) the temporary buffer to linearize the iov.
>   (3) glTexSubImage again, this time in the host.
>
> udmabuf will kill (2) easily.  And possibly (3) with some more effort
> if virglrender gets support for importing dmabufs.

I agree udmabuf can definitely help in the general (non-external)
texture upload case.

>
> > i) virtio_gpu_resource_create_coherent -- for strictly coherent needs
> > (i.e, no unmap needed)
>
> Why no unmap?  Because they just show up in the pci bar, which the guest
> driver has permanently mapped anyway?

Isn't that GL's definition of coherent mappings (I imagine Vulkan's is similar):

"Coherent maps guarantee that the effect of writes to a buffer's data
store by either the client or server will eventually become visible to
the other without further intervention from the application. In the
absence of this bit, persistent mappings are not coherent and modified
ranges of the buffer store must be explicitly communicated to the GL,
either by unmapping the buffer, or through a call to
glFlushMappedBufferRange or glMemoryBarrier."

https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/glMapBufferRange.xhtml

>
> > ii) virtio_gpu_resource_create_3d  -- may or may not be host backed
> > (depends on the PCI bar size, platform-specific information -- guest
> > doesn't need to know)
>
> Hmm?  How can this work in a way which is transparent for the guest?

We already need to extend the DRM_VIRTGPU_RESOURCE_INFO ioctl, since
it doesn't return the stride and doesn't work for YUV buffers (see
crrev.com/c/1208591).  Maybe we can also add a bitmask, which we can
populate with memory info (i.e, HOST_BIT | COHERENT_BIT)?

>
> cheers,
>   Gerd
>