[PATCH v3 0/4] drm/xe: Rework rebinding in preparation for same-vm eviction

Thomas Hellström thomas.hellstrom at linux.intel.com
Wed Mar 27 09:11:32 UTC 2024


We are not allowing eviction / shrinking of completely unbound
local objects during exec and rebind worker.
Since unbinding is the UMD primary means of
freeing up memory on local VM overcommit situations,
this needs to be addressed.

Such funtionality will also open up the possibility to evict
purgeable local objects with upcoming changes.

To make this work properly, rebinding needs to be moved to the
while-not-all-locked drm-exec loop, since rebinding may allocate
gpu page table bos and thus cause evictions which forces us
to re-run validation.

This is done in patch 4, but when crafting that patch, a number
couple of rebinding flaws were discovered.

1) When saving the rebinding fence we always presumed the
   rebinding fences were ordered. That is not true, and is
   fixed in patch 2, where we attach the rebind fences as
   kernel fences to the vm's resv.
2) In fact, TLB invalidation fences may currently not be assumed to be
   ordered at all. This is fixed in patch 3.
3) The combination of fixes for 1) and 2) makes the rebind
   of each vma wait for the TLB invalidation of the previous
   rebind, which is unnecessary and would incur unneeded latency.
   This is fixed in patch 1 where we move rebind TLB invalidation
   to the ring ops.

v2:
- Simplify if-statements around the tlb_flush_seqno.
      (Matthew Brost)
- Add some comments and asserts.
- Remove a leftover call to xe_vm_rebind() (Matt Brost)
- Add a helper function xe_vm_validate_rebind() (Matt Brost)
- Rebasing
v3:
- Only include the patches that reworks rebinding and
  includes it in the drm_exec locking loop, since
  the patches that dealt with same-vm eviction allowed
  page-table allocation to evict the object being bound,
  and while this was addressed properly during exec and
  rebind worker, it was not during the VM_BIND ioctl.
- Squash the patches moving fence reservation and
  moving rebinding into the locking loop since the
  code was not properly working in between those
  patches. (Matt Brost)
- Add code comments (Matt Brost)

Thomas Hellström (4):
  drm/xe: Use ring ops TLB invalidation for rebinds
  drm/xe: Rework rebinding
  drm/xe: Make TLB invalidation fences unordered
  drm/xe: Move vma rebinding to the drm_exec locking loop

 drivers/gpu/drm/xe/xe_exec.c                |  79 ++------------
 drivers/gpu/drm/xe/xe_exec_queue_types.h    |   5 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c        |   3 +-
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c |   1 -
 drivers/gpu/drm/xe/xe_gt_types.h            |   7 --
 drivers/gpu/drm/xe/xe_pt.c                  |  25 ++++-
 drivers/gpu/drm/xe/xe_ring_ops.c            |  11 +-
 drivers/gpu/drm/xe/xe_sched_job.c           |  10 ++
 drivers/gpu/drm/xe/xe_sched_job_types.h     |   2 +
 drivers/gpu/drm/xe/xe_vm.c                  | 110 ++++++++++++--------
 drivers/gpu/drm/xe/xe_vm.h                  |   8 +-
 drivers/gpu/drm/xe/xe_vm_types.h            |   8 +-
 12 files changed, 126 insertions(+), 143 deletions(-)

-- 
2.44.0



More information about the Intel-xe mailing list