[RFC 0/7] drm/virtio: Import scanout buffers from other devices

Gurchetan Singh gurchetansingh at chromium.org
Thu May 30 01:49:33 UTC 2024


On Fri, May 24, 2024 at 11:33 AM Kasireddy, Vivek <vivek.kasireddy at intel.com>
wrote:

> Hi,
>
> Sorry, my previous reply got messed up as a result of HTML formatting.
> This is
> a plain text version of the same reply.
>
> >
> >
> >       Having virtio-gpu import scanout buffers (via prime) from other
> >       devices means that we'd be adding a head to headless GPUs assigned
> >       to a Guest VM or additional heads to regular GPU devices that are
> >       passthrough'd to the Guest. In these cases, the Guest compositor
> >       can render into the scanout buffer using a primary GPU and has the
> >       secondary GPU (virtio-gpu) import it for display purposes.
> >
> >       The main advantage with this is that the imported scanout buffer
> can
> >       either be displayed locally on the Host (e.g, using Qemu + GTK UI)
> >       or encoded and streamed to a remote client (e.g, Qemu + Spice UI).
> >       Note that since Qemu uses udmabuf driver, there would be no
> > copies
> >       made of the scanout buffer as it is displayed. This should be
> >       possible even when it might reside in device memory such has
> > VRAM.
> >
> >       The specific use-case that can be supported with this series is
> when
> >       running Weston or other guest compositors with "additional-devices"
> >       feature (./weston --drm-device=card1 --additional-devices=card0).
> >       More info about this feature can be found at:
> >       https://gitlab.freedesktop.org/wayland/weston/-
> > /merge_requests/736
> >
> >       In the above scenario, card1 could be a dGPU or an iGPU and card0
> >       would be virtio-gpu in KMS only mode. However, the case where this
> >       patch series could be particularly useful is when card1 is a GPU VF
> >       that needs to share its scanout buffer (in a zero-copy way) with
> the
> >       GPU PF on the Host. Or, it can also be useful when the scanout
> buffer
> >       needs to be shared between any two GPU devices (assuming one of
> > them
> >       is assigned to a Guest VM) as long as they are P2P DMA compatible.
> >
> >
> >
> > Is passthrough iGPU-only or passthrough dGPU-only something you intend to
> > use?
> Our main use-case involves passthrough’g a headless dGPU VF device and
> sharing
> the Guest compositor’s scanout buffer with dGPU PF device on the Host.
> Same goal for
> headless iGPU VF to iGPU PF device as well.
>

Just to check my understanding: the same physical {i, d}GPU is partitioned
into the VF and PF, but the PF handles host-side display integration and
rendering?


> However, using a combination of iGPU and dGPU where either of them can be
> passthrough’d
> to the Guest is something I think can be supported with this patch series
> as well.
>
> >
> > If it's a dGPU + iGPU setup, then the way other people seem to do it is a
> > "virtualized" iGPU (via virgl/gfxstream/take your pick) and pass-through
> the
> > dGPU.
> >
> > For example, AMD seems to use virgl to allocate and import into the dGPU.
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23896
> >
> > https://lore.kernel.org/all/20231221100016.4022353-1-
> > julia.zhang at amd.com/
> >
> >
> > ChromeOS also uses that method (see crrev.com/c/3764931
> > <http://crrev.com/c/3764931> ) [cc: dGPU architect +Dominik Behr
> > <mailto:dbehr at google.com> ]
> >
> > So if iGPU + dGPU is the primary use case, you should be able to use
> these
> > methods as well.  The model would "virtualized iGPU" + passthrough dGPU,
> > not split SoCs.
> In our use-case, the goal is to have only one primary GPU (passthrough’d
> iGPU/dGPU)
> do all the rendering (using native DRI drivers) for clients/compositor and
> all the outputs
> and share the scanout buffers with the secondary GPU (virtio-gpu). Since
> this is mostly
> how Mutter (and also Weston) work in a multi-GPU setup, I am not sure if
> virgl is needed.
>

I think you can probably use virgl with the PF and others probably will,
but supporting multiple methods in Linux is not unheard of.

Does your patchset need the Mesa kmsro patchset to function correctly?

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9592

If so, I would try to get that reviewed first to meet DRM requirements (
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements).
You might explicitly call out the design decision you're making: ("We can
probably use virgl as the virtualized iGPU via PF, but that adds
unnecessary complexity b/c ______").

And, doing it this way means that no other userspace components need to be
> modified
> on both the Guest and the Host.
>
> >
> >
> >
> >       As part of the import, the virtio-gpu driver shares the dma
> >       addresses and lengths with Qemu which then determines whether
> > the
> >       memory region they belong to is owned by a PCI device or whether it
> >       is part of the Guest's system ram. If it is the former, it
> identifies
> >       the devid (or bdf) and bar and provides this info (along with
> offsets
> >       and sizes) to the udmabuf driver. In the latter case, instead of
> the
> >       the devid and bar it provides the memfd. The udmabuf driver then
> >       creates a dmabuf using this info that Qemu shares with Spice for
> >       encode via Gstreamer.
> >
> >       Note that the virtio-gpu driver registers a move_notify() callback
> >       to track location changes associated with the scanout buffer and
> >       sends attach/detach backing cmds to Qemu when appropriate. And,
> >       synchronization (that is, ensuring that Guest and Host are not
> >       using the scanout buffer at the same time) is ensured by pinning/
> >       unpinning the dmabuf as part of plane update and using a fence
> >       in resource_flush cmd.
> >
> >
> > I'm not sure how QEMU's display paths work, but with crosvm if you share
> > the guest-created dmabuf with the display, and the guest moves the
> backing
> > pages, the only recourse is the destroy the surface and show a black
> screen
> > to the user: not the best thing experience wise.
> Since Qemu GTK UI uses EGL, there is a blit done from the guest’s scanout
> buffer onto an EGL
> backed buffer on the Host. So, this problem would not happen as of now.
>

The guest kernel doesn't know you're using the QEMU GTK UI + EGL
host-side.

If somebody wants to use the virtio-gpu import mechanism with lower-level
Wayland-based display integration, then the problem would occur.

Perhaps, do that just to be safe unless you have performance concerns.

>
> > Only amdgpu calls dma_buf_move_notfiy(..), and you're probably testing on
> > Intel only, so you may not be hitting that code path anyways.
> I have tested with the Xe driver in the Guest which also calls
> dma_buf_move_notfiy(). But
> note that for dGPUs, both Xe and amdgpu migrate the scanout buffer from
> vram to system
> memory as part of export, because virtio-gpu is not P2P compatible.
> However, I am hoping
> to relax this (p2p check against virtio-gpu) in Xe driver if it detects
> that it is running in
> VF mode once the following patch series is merged:
>
> https://lore.kernel.org/dri-devel/20240422063602.3690124-1-vivek.kasireddy@intel.com/
>
> > I forgot the
> > exact reason, but apparently udmabuf may not work with amdgpu displays
> > and it seems the virtualized iGPU + dGPU is the way to go for amdgpu
> > anyways.
> I would really like to know why udmabuf would not work with amdgpu?
>

It's just a rumor I heard, but the idea is udmabuf would be imported into
AMDGPU_GEM_DOMAIN_CPU only.

https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c#n333

"AMDGPU_GEM_DOMAIN_CPU: System memory that is not GPU accessible. Memory in
this pool could be swapped out to disk if there is pressure."

https://dri.freedesktop.org/docs/drm/gpu/amdgpu.html

Perhaps that limitation is artificial and unnecessary, and it may indeed
work.  I don't think anybody has tried...


>
> > So I recommend just pinning the buffer for the lifetime of the
> > import for simplicity and correctness.
> Yeah, in this patch series, the dmabuf is indeed pinned, but only for a
> short duration in the Guest –
> just until the Host is done using it (blit or encode).
>
> Thanks,
> Vivek
>
> >
> >
> >       This series is available at:
> >       https://gitlab.freedesktop.org/Vivek/drm-tip/-
> > /commits/virtgpu_import_rfc
> >
> >       along with additional patches for Qemu and Spice here:
> >       https://gitlab.freedesktop.org/Vivek/qemu/-
> > /commits/virtgpu_dmabuf_pcidev
> >       https://gitlab.freedesktop.org/Vivek/spice/-
> > /commits/encode_dmabuf_v4
> >
> >       Patchset overview:
> >
> >       Patch 1:   Implement
> > VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING cmd
> >       Patch 2-3: Helpers to initalize, import, free imported object
> >       Patch 4-5: Import and use buffers from other devices for scanout
> >       Patch 6-7: Have udmabuf driver create dmabuf from PCI bars for P2P
> > DMA
> >
> >       This series is tested using the following method:
> >       - Run Qemu with the following relevant options:
> >         qemu-system-x86_64 -m 4096m ....
> >         -device vfio-pci,host=0000:03:00.0
> >         -device virtio-vga,max_outputs=1,blob=true,xres=1920,yres=1080
> >         -spice port=3001,gl=on,disable-ticketing=on,preferred-
> > codec=gstreamer:h264
> >         -object memory-backend-memfd,id=mem1,size=4096M
> >         -machine memory-backend=mem1 ...
> >       - Run upstream Weston with the following options in the Guest VM:
> >         ./weston --drm-device=card1 --additional-devices=card0
> >
> >       where card1 is a DG2 dGPU (passthrough'd and using xe driver in
> > Guest VM),
> >       card0 is virtio-gpu and the Host is using a RPL iGPU.
> >
> >       Cc: Gerd Hoffmann <kraxel at redhat.com
> > <mailto:kraxel at redhat.com> >
> >       Cc: Dongwon Kim <dongwon.kim at intel.com
> > <mailto:dongwon.kim at intel.com> >
> >       Cc: Daniel Vetter <daniel.vetter at ffwll.ch
> > <mailto:daniel.vetter at ffwll.ch> >
> >       Cc: Christian Koenig <christian.koenig at amd.com
> > <mailto:christian.koenig at amd.com> >
> >       Cc: Dmitry Osipenko <dmitry.osipenko at collabora.com
> > <mailto:dmitry.osipenko at collabora.com> >
> >       Cc: Rob Clark <robdclark at chromium.org
> > <mailto:robdclark at chromium.org> >
> >       Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com
> > <mailto:thomas.hellstrom at linux.intel.com> >
> >       Cc: Oded Gabbay <ogabbay at kernel.org
> > <mailto:ogabbay at kernel.org> >
> >       Cc: Michal Wajdeczko <michal.wajdeczko at intel.com
> > <mailto:michal.wajdeczko at intel.com> >
> >       Cc: Michael Tretter <m.tretter at pengutronix.de
> > <mailto:m.tretter at pengutronix.de> >
> >
> >       Vivek Kasireddy (7):
> >         drm/virtio: Implement
> > VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING cmd
> >         drm/virtio: Add a helper to map and note the dma addrs and
> > lengths
> >         drm/virtio: Add helpers to initialize and free the imported
> object
> >         drm/virtio: Import prime buffers from other devices as guest
> blobs
> >         drm/virtio: Ensure that bo's backing store is valid while
> updating
> >           plane
> >         udmabuf/uapi: Add new ioctl to create a dmabuf from PCI bar
> > regions
> >         udmabuf: Implement UDMABUF_CREATE_LIST_FOR_PCIDEV ioctl
> >
> >        drivers/dma-buf/udmabuf.c              | 122 ++++++++++++++++--
> >        drivers/gpu/drm/virtio/virtgpu_drv.h   |   8 ++
> >        drivers/gpu/drm/virtio/virtgpu_plane.c |  56 ++++++++-
> >        drivers/gpu/drm/virtio/virtgpu_prime.c | 167
> > ++++++++++++++++++++++++-
> >        drivers/gpu/drm/virtio/virtgpu_vq.c    |  15 +++
> >        include/uapi/linux/udmabuf.h           |  11 +-
> >        6 files changed, 368 insertions(+), 11 deletions(-)
> >
> >       --
> >       2.43.0
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20240529/e0d26b2f/attachment-0001.htm>


More information about the dri-devel mailing list