<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 28, 2024 at 2:01 AM Vivek Kasireddy <<a href="mailto:vivek.kasireddy@intel.com" target="_blank">vivek.kasireddy@intel.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Having virtio-gpu import scanout buffers (via prime) from other<br> devices means that we'd be adding a head to headless GPUs assigned<br> to a Guest VM or additional heads to regular GPU devices that are<br> passthrough'd to the Guest. In these cases, the Guest compositor<br> can render into the scanout buffer using a primary GPU and has the<br> secondary GPU (virtio-gpu) import it for display purposes.<br> <br> The main advantage with this is that the imported scanout buffer can<br> either be displayed locally on the Host (e.g, using Qemu + GTK UI)<br> or encoded and streamed to a remote client (e.g, Qemu + Spice UI).<br> Note that since Qemu uses udmabuf driver, there would be no copies<br> made of the scanout buffer as it is displayed. This should be<br> possible even when it might reside in device memory such has VRAM.<br> <br> The specific use-case that can be supported with this series is when<br> running Weston or other guest compositors with "additional-devices"<br> feature (./weston --drm-device=card1 --additional-devices=card0).<br> More info about this feature can be found at:<br> <a href="https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/736" rel="noreferrer" target="_blank">https://gitlab.freedesktop.org/wayland/weston/-/merge_requests/736</a><br> <br> In the above scenario, card1 could be a dGPU or an iGPU and card0<br> would be virtio-gpu in KMS only mode. However, the case where this<br> patch series could be particularly useful is when card1 is a GPU VF<br> that needs to share its scanout buffer (in a zero-copy way) with the<br> GPU PF on the Host. Or, it can also be useful when the scanout buffer<br> needs to be shared between any two GPU devices (assuming one of them<br> is assigned to a Guest VM) as long as they are P2P DMA compatible.<br></blockquote><div><br></div><div>Is passthrough iGPU-only or passthrough dGPU-only something you intend to use? </div><div><br></div><div>If it's a dGPU + iGPU setup, then the way other people seem to do it is a "virtualized" iGPU (via virgl/gfxstream/take your pick) and pass-through the dGPU.</div><div><br></div><div>For example, AMD seems to use virgl to allocate and import into the dGPU.</div><div><br></div><div><a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23896" target="_blank">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23896</a><br></div><div><a href="https://lore.kernel.org/all/20231221100016.4022353-1-julia.zhang@amd.com/" target="_blank">https://lore.kernel.org/all/20231221100016.4022353-1-julia.zhang@amd.com/</a><br></div><div><br></div><div>ChromeOS also uses that method (see <a href="http://crrev.com/c/3764931">crrev.com/c/3764931</a>) [cc: dGPU architect <a class="gmail_plusreply" id="plusReplyChip-5" href="mailto:dbehr@google.com" tabindex="-1">+Dominik Behr</a>]</div><div><br></div><div>So if iGPU + dGPU is the primary use case, you should be able to use these methods as well. The model would "virtualized iGPU" + passthrough dGPU, not split SoCs. </div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> As part of the import, the virtio-gpu driver shares the dma<br> addresses and lengths with Qemu which then determines whether the<br> memory region they belong to is owned by a PCI device or whether it<br> is part of the Guest's system ram. If it is the former, it identifies<br> the devid (or bdf) and bar and provides this info (along with offsets<br> and sizes) to the udmabuf driver. In the latter case, instead of the<br> the devid and bar it provides the memfd. The udmabuf driver then<br> creates a dmabuf using this info that Qemu shares with Spice for<br> encode via Gstreamer.<br> <br> Note that the virtio-gpu driver registers a move_notify() callback<br> to track location changes associated with the scanout buffer and<br> sends attach/detach backing cmds to Qemu when appropriate. And,<br> synchronization (that is, ensuring that Guest and Host are not<br> using the scanout buffer at the same time) is ensured by pinning/<br> unpinning the dmabuf as part of plane update and using a fence<br> in resource_flush cmd.</blockquote><div><br></div><div>I'm not sure how QEMU's display paths work, but with crosvm if you share the guest-created dmabuf with the display, and the guest moves the backing pages, the only recourse is the destroy the surface and show a black screen to the user: not the best thing experience wise.</div><div><br></div><div>Only amdgpu calls dma_buf_move_notfiy(..), and you're probably testing on Intel only, so you may not be hitting that code path anyways. I forgot the exact reason, but apparently udmabuf may not work with amdgpu displays and it seems the virtualized iGPU + dGPU is the way to go for amdgpu anyways. So I recommend just pinning the buffer for the lifetime of the import for simplicity and correctness. </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> This series is available at:<br> <a href="https://gitlab.freedesktop.org/Vivek/drm-tip/-/commits/virtgpu_import_rfc" rel="noreferrer" target="_blank">https://gitlab.freedesktop.org/Vivek/drm-tip/-/commits/virtgpu_import_rfc</a><br> <br> along with additional patches for Qemu and Spice here:<br> <a href="https://gitlab.freedesktop.org/Vivek/qemu/-/commits/virtgpu_dmabuf_pcidev" rel="noreferrer" target="_blank">https://gitlab.freedesktop.org/Vivek/qemu/-/commits/virtgpu_dmabuf_pcidev</a><br> <a href="https://gitlab.freedesktop.org/Vivek/spice/-/commits/encode_dmabuf_v4" rel="noreferrer" target="_blank">https://gitlab.freedesktop.org/Vivek/spice/-/commits/encode_dmabuf_v4</a> <br> <br> Patchset overview:<br> <br> Patch 1: Implement VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING cmd<br> Patch 2-3: Helpers to initalize, import, free imported object<br> Patch 4-5: Import and use buffers from other devices for scanout<br> Patch 6-7: Have udmabuf driver create dmabuf from PCI bars for P2P DMA<br> <br> This series is tested using the following method:<br> - Run Qemu with the following relevant options:<br> qemu-system-x86_64 -m 4096m ....<br> -device vfio-pci,host=0000:03:00.0<br> -device virtio-vga,max_outputs=1,blob=true,xres=1920,yres=1080<br> -spice port=3001,gl=on,disable-ticketing=on,preferred-codec=gstreamer:h264<br> -object memory-backend-memfd,id=mem1,size=4096M<br> -machine memory-backend=mem1 ...<br> - Run upstream Weston with the following options in the Guest VM:<br> ./weston --drm-device=card1 --additional-devices=card0<br> <br> where card1 is a DG2 dGPU (passthrough'd and using xe driver in Guest VM),<br> card0 is virtio-gpu and the Host is using a RPL iGPU.<br> <br> Cc: Gerd Hoffmann <<a href="mailto:kraxel@redhat.com" target="_blank">kraxel@redhat.com</a>><br> Cc: Dongwon Kim <<a href="mailto:dongwon.kim@intel.com" target="_blank">dongwon.kim@intel.com</a>><br> Cc: Daniel Vetter <<a href="mailto:daniel.vetter@ffwll.ch" target="_blank">daniel.vetter@ffwll.ch</a>><br> Cc: Christian Koenig <<a href="mailto:christian.koenig@amd.com" target="_blank">christian.koenig@amd.com</a>><br> Cc: Dmitry Osipenko <<a href="mailto:dmitry.osipenko@collabora.com" target="_blank">dmitry.osipenko@collabora.com</a>><br> Cc: Rob Clark <<a href="mailto:robdclark@chromium.org" target="_blank">robdclark@chromium.org</a>><br> Cc: Thomas Hellström <<a href="mailto:thomas.hellstrom@linux.intel.com" target="_blank">thomas.hellstrom@linux.intel.com</a>><br> Cc: Oded Gabbay <<a href="mailto:ogabbay@kernel.org" target="_blank">ogabbay@kernel.org</a>><br> Cc: Michal Wajdeczko <<a href="mailto:michal.wajdeczko@intel.com" target="_blank">michal.wajdeczko@intel.com</a>><br> Cc: Michael Tretter <<a href="mailto:m.tretter@pengutronix.de" target="_blank">m.tretter@pengutronix.de</a>><br> <br> Vivek Kasireddy (7):<br> drm/virtio: Implement VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING cmd<br> drm/virtio: Add a helper to map and note the dma addrs and lengths<br> drm/virtio: Add helpers to initialize and free the imported object<br> drm/virtio: Import prime buffers from other devices as guest blobs<br> drm/virtio: Ensure that bo's backing store is valid while updating<br> plane<br> udmabuf/uapi: Add new ioctl to create a dmabuf from PCI bar regions<br> udmabuf: Implement UDMABUF_CREATE_LIST_FOR_PCIDEV ioctl<br> <br> drivers/dma-buf/udmabuf.c | 122 ++++++++++++++++--<br> drivers/gpu/drm/virtio/virtgpu_drv.h | 8 ++<br> drivers/gpu/drm/virtio/virtgpu_plane.c | 56 ++++++++-<br> drivers/gpu/drm/virtio/virtgpu_prime.c | 167 ++++++++++++++++++++++++-<br> drivers/gpu/drm/virtio/virtgpu_vq.c | 15 +++<br> include/uapi/linux/udmabuf.h | 11 +-<br> 6 files changed, 368 insertions(+), 11 deletions(-)<br> <br> -- <br> 2.43.0<br> <br> </blockquote></div></div>