[virglrenderer-devel] coherent memory access for virgl

Fri Sep 28 03:48:41 UTC 2018

On Thu, Sep 27, 2018 at 12:04 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
>
> > > > Right now, for most
> > > > textures, virglrenderer copies iovecs into a temporary buffer (see
> > > > read_transfer_data), and then calls glTexSubImage2D*.
> > >
> > > Is virglrenderer clever enough to skip the temporary buffer copy in case
> > > it finds niov == 1 ?
> >
> > There is a fast-path in read_transfer_data / write_transfer_data
> > depending on the send_size and various other parameters, but in my gdb
> > experience it's not used most of the time.
>
> Looking at the code vrend_renderer_transfer_write_iov() seems to not
> call read_transfer_data() in the first place in case num_iovs == 1.
>
> So, qemu could just pass in a iov with one element, and things would
> improve with current virglrenderer versions.

If that's possible, that'd be great.

>  Newer virglrenderer
> versions could consume dmabuf handle and mapping pointer instead (and
> import the dmabuf if possible).
>
> > However, such cases are prominent in the Android / ChromeOS display
> > stacks (and often mapped in the guest), so that's why I'm interested
> > in making them backed by host memory and display/GPU optimized.  We'll
> > need a way of expressing modifiers to the guest, so this delves into
> > the earlier discussion of wayland host proxying.  Who should allocate
> > -- the host compositor, the VMM, virglrenderer?  The host compositor
> > seems like the most natural choice.
>
> Why the host compositor?  Normal wayland clients don't ask the host
> compositor for buffers either, right?  They do egl rendering using
> render nodes, export the front buffer as dmabuf and pass them to the
> compositor for rendering ...
>
> I think virglrenderer should allocate the buffers.

virglrenderer allocating the buffers should work for now.  There was
some discussion on using modifiers for v4l2, but that didn't go very
far:

https://lists.freedesktop.org/archives/dri-devel/2017-August/150850.html

Modifiers are designed to have multiple consumer apis, but I've only
seen EGL + KMS implementations.

If virglrenderer will do the allocation, what about
virtio_gpu_resource_create_2d -- who calls that in guest userspace?
Should it ever be host-optimized (since we're essentially talking
about single-level 2D textures/render targets/scan-out buffers)?

>
> > Are there any plans of the guest using host-optimized buffers
> > (communicated via modifiers) in a purely Linux guest?
>
> I think that would imply the virgl mesa driver must be able to handle
> pretty much any vendors compressed/tiled buffer format.  Hmm, no idea
> how difficuilt that would be.

It could actually be pretty easy, due the Gallium abstraction.  We
need give Gallium a linear view into the texture -- which we can
always fallback to GL to do.  Other items include:

1) Fix the resource info ioctl to actually return the stride + format modifiers
2) Expose the modifiers the host supports (via
eglQueryDmaBufModifiersEXT) in the guest.  The applicability of this
depends on the userspace (Android won't use this).

> > > > But making host memory guest visible will bring the worst-case buffer
> > > > copies from 3 to 1.  For textures, if we start counting when the GPU
> > > > buffer gets detiled, there will be 5 copies currently, 3 with udmabuf,
> > > > and 1 with host exposed memory.
> > >
> > > 5 copies?  verbose please.  I can see three:
> >
> > It depends on when you start counting.  I started counting from
> > vrend_renderer_transfer_send_iov, which includes fetching the data
> > from the host and packing that data into the iovecs.  Probably a worst
> > case scenario.
>
> Ah, you talk about the host -> guest path, not guest -> host (or both?).
>
> Is host -> guest transfer used that much?  I'd expect the guest just
> asks the host to display the rendered result instead of reading it back.

Both.  Not sure how common it is in Linux, but Android maps/unmaps YUV
buffers quite a bit, which are later used by GL.

>
> > > > ii) virtio_gpu_resource_create_3d  -- may or may not be host backed
> > > > (depends on the PCI bar size, platform-specific information -- guest
> > > > doesn't need to know)
> > >
> > > Hmm?  How can this work in a way which is transparent for the guest?
> >
> > We already need to extend the DRM_VIRTGPU_RESOURCE_INFO ioctl, since
> > it doesn't return the stride and doesn't work for YUV buffers (see
> > crrev.com/c/1208591).  Maybe we can also add a bitmask, which we can
> > populate with memory info (i.e, HOST_BIT | COHERENT_BIT)?
>
> Well, for userspace it can be transparent.  Userspace will just call
> mmap() and the kernel will sort things transparently depending on the
> buffer allocation (userspace knowing how buffers are allocated is
> probably useful nevertheless).
>
> I was thinking about the kernel / vmm interface.  The virtio-gpu kms
> driver certainly needs to know about the buffer allocation ...

The KMS part will be more difficult than the EGL part.

For example, on some ARM devices, AFBC can be only used on the (host)
primary KMS plane.  If a video running in QEMU is full screen, it's
advantageous to allocate an AFBC buffer and then scan-it out.  But if
the QEMU window becomes smaller, the best option is to use a linear
strided buffer and schedule that as an overlay.  But the guest always
thinks it's fullscreen ...

How is the guest currently notified about size changes of it's drawing
target?  Do buffers get re-allocated?

Previously (see slide 25 of
https://www.x.org/wiki/Events/XDC2017/widawsky_fb_modifiers.pdf),
there was discussion about the compositor sending supported modifiers
to the client (QEMU) through some sort of protocol.  Does the
compositor notify the client of modifier changes if window size
changes?  Perhaps wayland experts (Tomeu?) know.

>
> cheers,
>   Gerd
>