[virglrenderer-devel] coherent memory access for virgl

Thu Oct 4 07:04:42 UTC 2018

On 10/3/18 4:46 PM, Tomeu Vizoso wrote:
> On 9/28/18 5:48 AM, Gurchetan Singh wrote:
>> On Thu, Sep 27, 2018 at 12:04 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
>>>
>>>>>> Right now, for most
>>>>>> textures, virglrenderer copies iovecs into a temporary buffer (see
>>>>>> read_transfer_data), and then calls glTexSubImage2D*.
>>>>>
>>>>> Is virglrenderer clever enough to skip the temporary buffer copy in 
>>>>> case
>>>>> it finds niov == 1 ?
>>>>
>>>> There is a fast-path in read_transfer_data / write_transfer_data
>>>> depending on the send_size and various other parameters, but in my gdb
>>>> experience it's not used most of the time.
>>>
>>> Looking at the code vrend_renderer_transfer_write_iov() seems to not
>>> call read_transfer_data() in the first place in case num_iovs == 1.
>>>
>>> So, qemu could just pass in a iov with one element, and things would
>>> improve with current virglrenderer versions.
>>
>> If that's possible, that'd be great.
>>
>>>   Newer virglrenderer
>>> versions could consume dmabuf handle and mapping pointer instead (and
>>> import the dmabuf if possible).
>>>
>>>> However, such cases are prominent in the Android / ChromeOS display
>>>> stacks (and often mapped in the guest), so that's why I'm interested
>>>> in making them backed by host memory and display/GPU optimized.  We'll
>>>> need a way of expressing modifiers to the guest, so this delves into
>>>> the earlier discussion of wayland host proxying.  Who should allocate
>>>> -- the host compositor, the VMM, virglrenderer?  The host compositor
>>>> seems like the most natural choice.
>>>
>>> Why the host compositor?  Normal wayland clients don't ask the host
>>> compositor for buffers either, right?  They do egl rendering using
>>> render nodes, export the front buffer as dmabuf and pass them to the
>>> compositor for rendering ...
>>>
>>> I think virglrenderer should allocate the buffers.
>>
>> virglrenderer allocating the buffers should work for now.  There was
>> some discussion on using modifiers for v4l2, but that didn't go very
>> far:
>>
>> https://lists.freedesktop.org/archives/dri-devel/2017-August/150850.html
>>
>> Modifiers are designed to have multiple consumer apis, but I've only
>> seen EGL + KMS implementations.
>>
>> If virglrenderer will do the allocation, what about
>> virtio_gpu_resource_create_2d -- who calls that in guest userspace?
>> Should it ever be host-optimized (since we're essentially talking
>> about single-level 2D textures/render targets/scan-out buffers)?
>>
>>>
>>>> Are there any plans of the guest using host-optimized buffers
>>>> (communicated via modifiers) in a purely Linux guest?
>>>
>>> I think that would imply the virgl mesa driver must be able to handle
>>> pretty much any vendors compressed/tiled buffer format.  Hmm, no idea
>>> how difficuilt that would be.
>>
>> It could actually be pretty easy, due the Gallium abstraction.  We
>> need give Gallium a linear view into the texture -- which we can
>> always fallback to GL to do.  Other items include:
>>
>> 1) Fix the resource info ioctl to actually return the stride + format 
>> modifiers
>> 2) Expose the modifiers the host supports (via
>> eglQueryDmaBufModifiersEXT) in the guest.  The applicability of this
>> depends on the userspace (Android won't use this).
>>
>>>>>> But making host memory guest visible will bring the worst-case buffer
>>>>>> copies from 3 to 1.  For textures, if we start counting when the GPU
>>>>>> buffer gets detiled, there will be 5 copies currently, 3 with udmabuf,
>>>>>> and 1 with host exposed memory.
>>>>>
>>>>> 5 copies?  verbose please.  I can see three:
>>>>
>>>> It depends on when you start counting.  I started counting from
>>>> vrend_renderer_transfer_send_iov, which includes fetching the data
>>>> from the host and packing that data into the iovecs.  Probably a worst
>>>> case scenario.
>>>
>>> Ah, you talk about the host -> guest path, not guest -> host (or both?).
>>>
>>> Is host -> guest transfer used that much?  I'd expect the guest just
>>> asks the host to display the rendered result instead of reading it back.
>>
>> Both.  Not sure how common it is in Linux, but Android maps/unmaps YUV
>> buffers quite a bit, which are later used by GL.
>>
>>>
>>>>>> ii) virtio_gpu_resource_create_3d  -- may or may not be host backed
>>>>>> (depends on the PCI bar size, platform-specific information -- guest
>>>>>> doesn't need to know)
>>>>>
>>>>> Hmm?  How can this work in a way which is transparent for the guest?
>>>>
>>>> We already need to extend the DRM_VIRTGPU_RESOURCE_INFO ioctl, since
>>>> it doesn't return the stride and doesn't work for YUV buffers (see
>>>> crrev.com/c/1208591).  Maybe we can also add a bitmask, which we can
>>>> populate with memory info (i.e, HOST_BIT | COHERENT_BIT)?
>>>
>>> Well, for userspace it can be transparent.  Userspace will just call
>>> mmap() and the kernel will sort things transparently depending on the
>>> buffer allocation (userspace knowing how buffers are allocated is
>>> probably useful nevertheless).
>>>
>>> I was thinking about the kernel / vmm interface.  The virtio-gpu kms
>>> driver certainly needs to know about the buffer allocation ...
>>
>> The KMS part will be more difficult than the EGL part.
>>
>> For example, on some ARM devices, AFBC can be only used on the (host)
>> primary KMS plane.  If a video running in QEMU is full screen, it's
>> advantageous to allocate an AFBC buffer and then scan-it out.  But if
>> the QEMU window becomes smaller, the best option is to use a linear
>> strided buffer and schedule that as an overlay.  But the guest always
>> thinks it's fullscreen ...
>>
>> How is the guest currently notified about size changes of it's drawing
>> target?  Do buffers get re-allocated?
>>
>> Previously (see slide 25 of
>> https://www.x.org/wiki/Events/XDC2017/widawsky_fb_modifiers.pdf),
>> there was discussion about the compositor sending supported modifiers
>> to the client (QEMU) through some sort of protocol.  Does the
>> compositor notify the client of modifier changes if window size
>> changes?  Perhaps wayland experts (Tomeu?) know.
> 
> My understanding is that with wl_dmabuf, the buffers are allocated by the 
> client. So it can decide whether to use a modifier or not, and on surface 
> size changes it can allocate a different one. But TBH, I haven't checked.

Daniel pointed me to a plan for the compositor to give additional 
information to clients that they can use to better decide the exact 
format and modifiers of their buffers: 
https://gitlab.freedesktop.org/wayland/wayland/issues/59

Cheers,

Tomeu