[RFC 0/7] drm/virtio: Import scanout buffers from other devices

Thu May 23 21:33:59 UTC 2024

On Thu, Mar 28, 2024 at 2:01 AM Vivek Kasireddy <vivek.kasireddy at intel.com>
wrote:

> Having virtio-gpu import scanout buffers (via prime) from other
> devices means that we'd be adding a head to headless GPUs assigned
> to a Guest VM or additional heads to regular GPU devices that are
> passthrough'd to the Guest. In these cases, the Guest compositor
> can render into the scanout buffer using a primary GPU and has the
> secondary GPU (virtio-gpu) import it for display purposes.
>
> The main advantage with this is that the imported scanout buffer can
> either be displayed locally on the Host (e.g, using Qemu + GTK UI)
> or encoded and streamed to a remote client (e.g, Qemu + Spice UI).
> Note that since Qemu uses udmabuf driver, there would be no copies
> made of the scanout buffer as it is displayed. This should be
> possible even when it might reside in device memory such has VRAM.
>
> The specific use-case that can be supported with this series is when
> running Weston or other guest compositors with "additional-devices"
> feature (./weston --drm-device=card1 --additional-devices=card0).
> More info about this feature can be found at:
> https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/736
>
> In the above scenario, card1 could be a dGPU or an iGPU and card0
> would be virtio-gpu in KMS only mode. However, the case where this
> patch series could be particularly useful is when card1 is a GPU VF
> that needs to share its scanout buffer (in a zero-copy way) with the
> GPU PF on the Host. Or, it can also be useful when the scanout buffer
> needs to be shared between any two GPU devices (assuming one of them
> is assigned to a Guest VM) as long as they are P2P DMA compatible.
>

Is passthrough iGPU-only or passthrough dGPU-only something you intend to
use?

If it's a dGPU + iGPU setup, then the way other people seem to do it is a
"virtualized" iGPU (via virgl/gfxstream/take your pick) and pass-through
the dGPU.

For example, AMD seems to use virgl to allocate and import into the dGPU.

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23896
https://lore.kernel.org/all/20231221100016.4022353-1-julia.zhang@amd.com/

ChromeOS also uses that method (see crrev.com/c/3764931) [cc: dGPU
architect +Dominik Behr <dbehr at google.com>]

So if iGPU + dGPU is the primary use case, you should be able to use these
methods as well.  The model would "virtualized iGPU" + passthrough dGPU,
not split SoCs.

> As part of the import, the virtio-gpu driver shares the dma
> addresses and lengths with Qemu which then determines whether the
> memory region they belong to is owned by a PCI device or whether it
> is part of the Guest's system ram. If it is the former, it identifies
> the devid (or bdf) and bar and provides this info (along with offsets
> and sizes) to the udmabuf driver. In the latter case, instead of the
> the devid and bar it provides the memfd. The udmabuf driver then
> creates a dmabuf using this info that Qemu shares with Spice for
> encode via Gstreamer.
>
> Note that the virtio-gpu driver registers a move_notify() callback
> to track location changes associated with the scanout buffer and
> sends attach/detach backing cmds to Qemu when appropriate. And,
> synchronization (that is, ensuring that Guest and Host are not
> using the scanout buffer at the same time) is ensured by pinning/
> unpinning the dmabuf as part of plane update and using a fence
> in resource_flush cmd.

I'm not sure how QEMU's display paths work, but with crosvm if you share
the guest-created dmabuf with the display, and the guest moves the backing
pages, the only recourse is the destroy the surface and show a black screen
to the user: not the best thing experience wise.

Only amdgpu calls dma_buf_move_notfiy(..), and you're probably testing on
Intel only, so you may not be hitting that code path anyways.  I forgot the
exact reason, but apparently udmabuf may not work with amdgpu displays and
it seems the virtualized iGPU + dGPU is the way to go for amdgpu anyways.
So I recommend just pinning the buffer for the lifetime of the import for
simplicity and correctness.

> This series is available at:
> https://gitlab.freedesktop.org/Vivek/drm-tip/-/commits/virtgpu_import_rfc
>
> along with additional patches for Qemu and Spice here:
> https://gitlab.freedesktop.org/Vivek/qemu/-/commits/virtgpu_dmabuf_pcidev
> https://gitlab.freedesktop.org/Vivek/spice/-/commits/encode_dmabuf_v4
>
> Patchset overview:
>
> Patch 1:   Implement VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING cmd
> Patch 2-3: Helpers to initalize, import, free imported object
> Patch 4-5: Import and use buffers from other devices for scanout
> Patch 6-7: Have udmabuf driver create dmabuf from PCI bars for P2P DMA
>
> This series is tested using the following method:
> - Run Qemu with the following relevant options:
>   qemu-system-x86_64 -m 4096m ....
>   -device vfio-pci,host=0000:03:00.0
>   -device virtio-vga,max_outputs=1,blob=true,xres=1920,yres=1080
>   -spice
> port=3001,gl=on,disable-ticketing=on,preferred-codec=gstreamer:h264
>   -object memory-backend-memfd,id=mem1,size=4096M
>   -machine memory-backend=mem1 ...
> - Run upstream Weston with the following options in the Guest VM:
>   ./weston --drm-device=card1 --additional-devices=card0
>
> where card1 is a DG2 dGPU (passthrough'd and using xe driver in Guest VM),
> card0 is virtio-gpu and the Host is using a RPL iGPU.
>
> Cc: Gerd Hoffmann <kraxel at redhat.com>
> Cc: Dongwon Kim <dongwon.kim at intel.com>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Cc: Christian Koenig <christian.koenig at amd.com>
> Cc: Dmitry Osipenko <dmitry.osipenko at collabora.com>
> Cc: Rob Clark <robdclark at chromium.org>
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Cc: Oded Gabbay <ogabbay at kernel.org>
> Cc: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Michael Tretter <m.tretter at pengutronix.de>
>
> Vivek Kasireddy (7):
>   drm/virtio: Implement VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING cmd
>   drm/virtio: Add a helper to map and note the dma addrs and lengths
>   drm/virtio: Add helpers to initialize and free the imported object
>   drm/virtio: Import prime buffers from other devices as guest blobs
>   drm/virtio: Ensure that bo's backing store is valid while updating
>     plane
>   udmabuf/uapi: Add new ioctl to create a dmabuf from PCI bar regions
>   udmabuf: Implement UDMABUF_CREATE_LIST_FOR_PCIDEV ioctl
>
>  drivers/dma-buf/udmabuf.c              | 122 ++++++++++++++++--
>  drivers/gpu/drm/virtio/virtgpu_drv.h   |   8 ++
>  drivers/gpu/drm/virtio/virtgpu_plane.c |  56 ++++++++-
>  drivers/gpu/drm/virtio/virtgpu_prime.c | 167 ++++++++++++++++++++++++-
>  drivers/gpu/drm/virtio/virtgpu_vq.c    |  15 +++
>  include/uapi/linux/udmabuf.h           |  11 +-
>  6 files changed, 368 insertions(+), 11 deletions(-)
>
> --
> 2.43.0
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20240523/ab1b024c/attachment-0001.htm>