[RFC PATCH 3/3] drm/virtio: drm_gem_plane_helper_prepare_fb for obj synchronization

Thu Aug 24 17:58:26 UTC 2023

On 8/23/2023 8:52 PM, Dmitry Osipenko wrote:
> On 8/20/23 23:58, Kim, Dongwon wrote:
>> On 8/17/2023 7:33 PM, Dmitry Osipenko wrote:
>>> On 7/13/23 01:44, Dongwon Kim wrote:
>>>> This helper is needed for framebuffer synchronization. Old
>>>> framebuffer data
>>>> is often displayed on the guest display without this helper.
>>>>
>>>> Cc: Gerd Hoffmann <kraxel at redhat.com>
>>>> Cc: Vivek Kasireddy <vivek.kasireddy at intel.com>
>>>> Signed-off-by: Dongwon Kim <dongwon.kim at intel.com>
>>>> ---
>>>>    drivers/gpu/drm/virtio/virtgpu_plane.c | 4 ++++
>>>>    1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c
>>>> b/drivers/gpu/drm/virtio/virtgpu_plane.c
>>>> index a063f06ab6c5..e197299489ce 100644
>>>> --- a/drivers/gpu/drm/virtio/virtgpu_plane.c
>>>> +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c
>>>> @@ -26,6 +26,7 @@
>>>>    #include <drm/drm_atomic_helper.h>
>>>>    #include <drm/drm_damage_helper.h>
>>>>    #include <drm/drm_fourcc.h>
>>>> +#include <drm/drm_gem_atomic_helper.h>
>>>>      #include "virtgpu_drv.h"
>>>>    @@ -271,6 +272,9 @@ static int virtio_gpu_plane_prepare_fb(struct
>>>> drm_plane *plane,
>>>>        vgfb = to_virtio_gpu_framebuffer(new_state->fb);
>>>>        vgplane_st = to_virtio_gpu_plane_state(new_state);
>>>>        bo = gem_to_virtio_gpu_obj(vgfb->base.obj[0]);
>>>> +
>>>> +    drm_gem_plane_helper_prepare_fb(plane, new_state);
>>> The implicit display BO sync should happen on a host side, unless you're
>>> rendering with Venus and then displaying with virgl. Doing it on guest
>>> side should be a major performance hit. Please provide a complete
>>> description of your setup: what VMM you use, config options, what tests
>>> you're running.
>> We use virtio-gpu as a kms device while using i915 as the render device
>> in our setup.
>> And we use QEMU as VMM. Virtio-gpu driver flushes the scanout to QEMU as
>> a blob resource
>> (reference to the buffer). QEMU then creates a dmabuf using udmabuf for
>> the blob
>> then renders it as a texture using OGL. The test I ran is simple. Just
>> starting terminal
>> app and typing things to see if there is any frame regression. I believe
>> this helper is
>> required since the BO on the guest is basically dmabuf that is being
>> shared between i915
>> and virtio-gpu driver. I didn't think about the performance impact. If
>> the impact is
>> too much and that is not acceptable, is there any other suggestions or
>> some tests I can try?
> You can do fence-wait in the guest userspace/Mesa after blitting/drawing
> to the udmabuf.

There is already synchronization between QEMU and virtio-gpu driver on the guest. Upon resource flush, virtio-gpu waits for the response for the message from the QEMU and QEMU sends out the response once rendering is done. The problem we are seeing is not that the rendering part is reusing the buffer before it's displayed by the QEMU. Problem we are facing is more like some frame is often not finished when "resource-flush" is issued. So unless there is a way for QEMU to wait for this fence (guest drm), I think we should have some synchronization point in the guest side.

I saw other DRM drivers, omap, tegra, vc4 and so on are doing the similar so I guess this is a generic solution for such cases. But I do understand your concern as the primary use case of virtio-gpu driver is for virgl. So this extra wait would cost some performance drop. But I have a couple of points here as well.

1. Wouldn't this extra wait caused by drm_gem_plane_helper_prepare_fb be minimal as the actual
rendering is done in the host?

2. Can we just make this helper called only if virgl is not used as 3D driver?

>
> You may run popular vk/gl gfx benchmarks using gl/sdl outputs to see the
> fps impact.

ok

>
> Virglrender today supports native contexts. The method you're using for
> GPU priming was proven to be slow in comparison to multi-gpu native
> contexts. There is ongoing work for supporting fence passing from guest
> to host [1] that allows to do fence-syncing on host. You'll find links
> to the WIP virtio-intel native context in [1] as well. You won't find
> GPU priming support using native context in [1], patches hasn't been
> published yet.
>
> [1]
> https://gitlab.freedesktop.org/virgl/virglrenderer/-/merge_requests/1138
>
> Note that in general it's not acceptable to upstream patches that serve
> downstream only. Yours display sync issue is irrelevant to the upstream
> stack unless you're going to upstream all the VMM and guest userspace
> patches, and in such case you should always publish all the patches and
> provide links.
>
> So, you need to check the performance impact and publish all the patches
> to the relevant upstream projects.

QEMU has all patches regarding this (blob scanout support) but guest Mesa
patch for KMSRO is still outstanding.

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592

I understand this specific patch would need more discussion/justification but
what about other two, are you generally ok with those in the same series?

Thanks!!