[Intel-xe] [PATCH v5 0/8] Port Xe to use GPUVA and implement NULL VM binds
Thomas Hellström
thomas.hellstrom at linux.intel.com
Tue Apr 4 13:20:47 UTC 2023
Hi, Matthew,
On 4/4/23 03:42, Matthew Brost wrote:
> GPUVA is common code written primarily by Danilo with the idea being a
> common place to track GPUVAs (VMAs in Xe) within an address space (VMs
> in Xe), track all the GPUVAs attached to GEMs, and a common way
> implement VM binds / unbinds with MMAP / MUNMAP semantics via creating
> operation lists. All of this adds up to a common way to implement VK
> sparse bindings.
>
> This series pulls in the GPUVA code written by Danilo plus some small
> fixes by myself into 1 large patch. Once the GPUVA makes it upstream, we
> can rebase and drop this patch. I believe what lands upstream should be
> nearly identical to this patch at least from an API perspective.
>
> The last three patches port Xe to GPUVA and add support for NULL VM binds
> (writes dropped, read zero, VK sparse support). An example of the
> semantics of this is below.
Going through thew new code in xe_vm.c I'm still concerned about our
error handling. In the cases we attempt to handle, for example an
-ENOMEM, the sync ioctl appears to be doing a fragile unwind whereas the
async worker puts the VM in an error state that can only be exited by a
RESTART ioctl (pls correct me if I'm wrong here). We could of course do
the same for sync ioctls, but I'm still wondering what happens if, for
example a MAP operation on a dual-gt VM fails after binding on the first gt?
I think we need to clearly define the desired behaviour in the uAPI and
then make sure we do the necessary changes with that in mind, and Even
if I prefer the !async_worker solution this can be don orthogonally to
that discussion. Ideally I'd see we find a way to *not* put the VM in
an error state like this, and I figure there are various ways to aid in
this, for example keeping a progress cookie in the user-space data or
deferring any failed bind operations to the next rebind worker or exec,
but in the end I think this boils down to allocating all needed
resources up-front for operations or groups of operations that are not
allowed to fail once we start to modify the page-tables.
Thoughts?
/Thomas
>
> MAP 0x0000-0x8000 to NULL - 0x0000-0x8000 writes dropped + read zero
> MAP 0x4000-0x5000 to a GEM - 0x0000-0x4000, 0x5000-0x8000 writes dropped + read zero; 0x4000-0x5000 mapped to a GEM
> UNMAP 0x3000-0x6000 - 0x0000-0x3000, 0x6000-0x8000 writes dropped + read zero
> UNMAP 0x0000-0x8000 - Nothing mapped
>
> No changins to existing behavior, rather just new functionality./
>
> v2: Fix CI build failure
> v3: Export mas_preallocate, add patch to avoid rebinds
> v5: Bug fixes, rebase, xe_vma size optimizations
>
> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>
> Danilo Krummrich (2):
> maple_tree: split up MA_STATE() macro
> drm: manager to keep track of GPUs VA mappings
>
> Matthew Brost (6):
> maple_tree: Export mas_preallocate
> drm/xe: Port Xe to GPUVA
> drm/xe: NULL binding implementation
> drm/xe: Avoid doing rebinds
> drm/xe: Reduce the number list links in xe_vma
> drm/xe: Optimize size of xe_vma allocation
>
> Documentation/gpu/drm-mm.rst | 31 +
> drivers/gpu/drm/Makefile | 1 +
> drivers/gpu/drm/drm_debugfs.c | 56 +
> drivers/gpu/drm/drm_gem.c | 3 +
> drivers/gpu/drm/drm_gpuva_mgr.c | 1890 ++++++++++++++++++
> drivers/gpu/drm/xe/xe_bo.c | 10 +-
> drivers/gpu/drm/xe/xe_bo.h | 1 +
> drivers/gpu/drm/xe/xe_device.c | 2 +-
> drivers/gpu/drm/xe/xe_exec.c | 4 +-
> drivers/gpu/drm/xe/xe_gt_pagefault.c | 29 +-
> drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 14 +-
> drivers/gpu/drm/xe/xe_guc_ct.c | 6 +-
> drivers/gpu/drm/xe/xe_migrate.c | 8 +-
> drivers/gpu/drm/xe/xe_pt.c | 176 +-
> drivers/gpu/drm/xe/xe_trace.h | 10 +-
> drivers/gpu/drm/xe/xe_vm.c | 1947 +++++++++----------
> drivers/gpu/drm/xe/xe_vm.h | 76 +-
> drivers/gpu/drm/xe/xe_vm_madvise.c | 87 +-
> drivers/gpu/drm/xe/xe_vm_types.h | 276 ++-
> include/drm/drm_debugfs.h | 24 +
> include/drm/drm_drv.h | 7 +
> include/drm/drm_gem.h | 75 +
> include/drm/drm_gpuva_mgr.h | 734 +++++++
> include/linux/maple_tree.h | 7 +-
> include/uapi/drm/xe_drm.h | 8 +
> lib/maple_tree.c | 1 +
> 26 files changed, 4203 insertions(+), 1280 deletions(-)
> create mode 100644 drivers/gpu/drm/drm_gpuva_mgr.c
> create mode 100644 include/drm/drm_gpuva_mgr.h
>
More information about the Intel-xe
mailing list