[virglrenderer-devel] coherent memory access for virgl

Wed Oct 3 14:46:55 UTC 2018

On 9/28/18 5:48 AM, Gurchetan Singh wrote:
> On Thu, Sep 27, 2018 at 12:04 AM Gerd Hoffmann <kraxel at redhat.com> wrote:
>>
>>>>> Right now, for most
>>>>> textures, virglrenderer copies iovecs into a temporary buffer (see
>>>>> read_transfer_data), and then calls glTexSubImage2D*.
>>>>
>>>> Is virglrenderer clever enough to skip the temporary buffer copy in case
>>>> it finds niov == 1 ?
>>>
>>> There is a fast-path in read_transfer_data / write_transfer_data
>>> depending on the send_size and various other parameters, but in my gdb
>>> experience it's not used most of the time.
>>
>> Looking at the code vrend_renderer_transfer_write_iov() seems to not
>> call read_transfer_data() in the first place in case num_iovs == 1.
>>
>> So, qemu could just pass in a iov with one element, and things would
>> improve with current virglrenderer versions.
> 
> If that's possible, that'd be great.
> 
>>   Newer virglrenderer
>> versions could consume dmabuf handle and mapping pointer instead (and
>> import the dmabuf if possible).
>>
>>> However, such cases are prominent in the Android / ChromeOS display
>>> stacks (and often mapped in the guest), so that's why I'm interested
>>> in making them backed by host memory and display/GPU optimized.  We'll
>>> need a way of expressing modifiers to the guest, so this delves into
>>> the earlier discussion of wayland host proxying.  Who should allocate
>>> -- the host compositor, the VMM, virglrenderer?  The host compositor
>>> seems like the most natural choice.
>>
>> Why the host compositor?  Normal wayland clients don't ask the host
>> compositor for buffers either, right?  They do egl rendering using
>> render nodes, export the front buffer as dmabuf and pass them to the
>> compositor for rendering ...
>>
>> I think virglrenderer should allocate the buffers.
> 
> virglrenderer allocating the buffers should work for now.  There was
> some discussion on using modifiers for v4l2, but that didn't go very
> far:
> 
> https://lists.freedesktop.org/archives/dri-devel/2017-August/150850.html
> 
> Modifiers are designed to have multiple consumer apis, but I've only
> seen EGL + KMS implementations.
> 
> If virglrenderer will do the allocation, what about
> virtio_gpu_resource_create_2d -- who calls that in guest userspace?
> Should it ever be host-optimized (since we're essentially talking
> about single-level 2D textures/render targets/scan-out buffers)?
> 
>>
>>> Are there any plans of the guest using host-optimized buffers
>>> (communicated via modifiers) in a purely Linux guest?
>>
>> I think that would imply the virgl mesa driver must be able to handle
>> pretty much any vendors compressed/tiled buffer format.  Hmm, no idea
>> how difficuilt that would be.
> 
> It could actually be pretty easy, due the Gallium abstraction.  We
> need give Gallium a linear view into the texture -- which we can
> always fallback to GL to do.  Other items include:
> 
> 1) Fix the resource info ioctl to actually return the stride + format modifiers
> 2) Expose the modifiers the host supports (via
> eglQueryDmaBufModifiersEXT) in the guest.  The applicability of this
> depends on the userspace (Android won't use this).
> 
>>>>> But making host memory guest visible will bring the worst-case buffer
>>>>> copies from 3 to 1.  For textures, if we start counting when the GPU
>>>>> buffer gets detiled, there will be 5 copies currently, 3 with udmabuf,
>>>>> and 1 with host exposed memory.
>>>>
>>>> 5 copies?  verbose please.  I can see three:
>>>
>>> It depends on when you start counting.  I started counting from
>>> vrend_renderer_transfer_send_iov, which includes fetching the data
>>> from the host and packing that data into the iovecs.  Probably a worst
>>> case scenario.
>>
>> Ah, you talk about the host -> guest path, not guest -> host (or both?).
>>
>> Is host -> guest transfer used that much?  I'd expect the guest just
>> asks the host to display the rendered result instead of reading it back.
> 
> Both.  Not sure how common it is in Linux, but Android maps/unmaps YUV
> buffers quite a bit, which are later used by GL.
> 
>>
>>>>> ii) virtio_gpu_resource_create_3d  -- may or may not be host backed
>>>>> (depends on the PCI bar size, platform-specific information -- guest
>>>>> doesn't need to know)
>>>>
>>>> Hmm?  How can this work in a way which is transparent for the guest?
>>>
>>> We already need to extend the DRM_VIRTGPU_RESOURCE_INFO ioctl, since
>>> it doesn't return the stride and doesn't work for YUV buffers (see
>>> crrev.com/c/1208591).  Maybe we can also add a bitmask, which we can
>>> populate with memory info (i.e, HOST_BIT | COHERENT_BIT)?
>>
>> Well, for userspace it can be transparent.  Userspace will just call
>> mmap() and the kernel will sort things transparently depending on the
>> buffer allocation (userspace knowing how buffers are allocated is
>> probably useful nevertheless).
>>
>> I was thinking about the kernel / vmm interface.  The virtio-gpu kms
>> driver certainly needs to know about the buffer allocation ...
> 
> The KMS part will be more difficult than the EGL part.
> 
> For example, on some ARM devices, AFBC can be only used on the (host)
> primary KMS plane.  If a video running in QEMU is full screen, it's
> advantageous to allocate an AFBC buffer and then scan-it out.  But if
> the QEMU window becomes smaller, the best option is to use a linear
> strided buffer and schedule that as an overlay.  But the guest always
> thinks it's fullscreen ...
> 
> How is the guest currently notified about size changes of it's drawing
> target?  Do buffers get re-allocated?
> 
> Previously (see slide 25 of
> https://www.x.org/wiki/Events/XDC2017/widawsky_fb_modifiers.pdf),
> there was discussion about the compositor sending supported modifiers
> to the client (QEMU) through some sort of protocol.  Does the
> compositor notify the client of modifier changes if window size
> changes?  Perhaps wayland experts (Tomeu?) know.

My understanding is that with wl_dmabuf, the buffers are allocated by the 
client. So it can decide whether to use a modifier or not, and on surface 
size changes it can allocate a different one. But TBH, I haven't checked.

Cheers,

Tomeu