Xen memory management primitives for GPU virtualization
Demi Marie Obenour
demi at invisiblethingslab.com
Sun Feb 2 05:08:46 UTC 2025
Cc:
Bcc:
Subject: Xen requirements for GPU virtualization via virtio-GPU
Reply-To:
X-Mutt-Fcc: =INBOX,=xen-devel,=Sent
X-Mutt-PGP: S
Recently, AMD submitted patches to the dri-devel mailing list to support
using application-provided buffers in virtio-GPU. This feature is
called Shared Virtual Memory (SVM) and it is implemented via an API
called User Pointer (userptr). This lead to some discussion on
dri-devel at lists.freedesktop.org and dri-devel IRC, from which I
concluded that Xen is missing critical primitives for GPU-accelerated
graphics and compute. The missing primitives for graphics are the ones
discussed at Xen Project Summit 2024, but it turns out that additional
primitives are needed for compute workloads.
As discussed at Xen Project Summit 2024, GPU acceleration via virtio-GPU
requires that an IOREQ server have access to the following primitives:
1. Map: Map a backend-provided buffer into the frontend. The buffer
might point to system memory or to a PCIe BAR. The frontend is _not_
allowed to use these buffers in hypercalls or grant them to other
domains. Accessing the pages using hypercalls directed at the
frontend fails as if the frontend did not have the pages. The only
exception is that the frontend _may_ be allowed to use the buffer in
a Map operation, provided that Revoke (below) is transitive.
2. Revoke: Revoke access to a buffer provided by the backend. Once
access is revoked, no operation on or in the frontend domain can
access or modify the pages, and the backend can safely reuse the
backing memory for other purposes. Furthermore, revocation is not
allowed to fail unless the backend or hypervisor is buggy, and if it
does fail for any reason, the backend will panic. Once access is
revoked, further accesses by the frontend will cause a fault that the
backend can intercept.
Map can be handled by userspace, but Revoke must be handled entirely
in-kernel. This is because Revoke happens from a Linux MMU notifier
callback, and those are not allowed to block, fail, or involve userspace
in any way. Since MMU notifier callbacks are called before freeing
memory, failure means that some other part of the system still has
access to freed memory that might be reused for other purposes, which
is a security vulnerability.
It turns out that compute has additional requirements. Graphics APIs
use DMA buffers (dmabufs), which only support a subset of operations.
In particular, direct I/O doesn't work. Compute APIs allow users to
make malloc'd memory accessible to the GPU. This memory can be used
in Linux kernel direct I/O and in other operations that do not work
with dmabufs. However, such memory starts out as frontend-owned pages,
so it must be converted to backend pages before it can be used by the
GPU. Linux supports migration of userspace pages, but this is too
unreliable to be used for this purpose. Instead, it will need to be
done by Xen and the backend.
This requires two additional primitives:
3. Steal: Convert frontend-owned pages to backend-owned pages and
provide the backend with a mapping of the page. After a successful
Steal operation, the pages are in the same state as if they had been
provided via Map. Steal fails if the pages are currently being used
in a hypercall, are MMIO (as opposed to system memory), were provided
by another domain via Map or grant tables, are currently foreign
mapped, are currently granted to another domain, or more generally
are accessible to any domain other than the target domain. The
frontend's quota is decreased by the number of pages stolen, and the
backend's quota is increased by the same amount. A successful Steal
operation means that Revoke and Map can be used to operate on the
pages.
4. Return: Convert a backend-owned page to a frontend-owned page. After
a successful call to Return, the backend is no lonter able to use
Revoke or Map. The returned page ceases to count against backend
quota and now counts against frontend quota.
Are these operations ones that Xen is interested in providing? There
may be other primitives that are sufficient to implement the above four,
but I believe that any solution that allows virtio-GPU to work must
allow the above four operations to be implemented. Without the first
two, virtio-GPU will not be able to support Vulkan or native contexts,
and without the second two also being present, shared virtual memory
and compute APIs that require it will not work.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20250202/9985af25/attachment.sig>
More information about the dri-devel
mailing list