[PATCH v2 00/15] CPU binds and ULLS on migration queue
Matthew Brost
matthew.brost at intel.com
Tue Aug 5 23:41:45 UTC 2025
We now have data to back up the need for CPU binds and ULLS on the
migration queue, as generated from [1].
On BMG, it is shown that when the GPU is consistently processing faults,
copy jobs run approximately 40–65µs faster (depending on the test case)
with ULLS compared to traditional GuC submission with SLPC enabled on
the migration queue (not upstream, but last patch in series can enable
this). Without SLPC enabled (upstream), ULLS is approximately 100–200µs
faster. Startup from a cold GPU shows an even larger speedup. Given the
critical nature of fault performance, ULLS appears to be a worthwhile
feature.
ULLS will consume more power (not yet measured) due to a continuously
running batch on the paging engine. However, compute UMDs already do
this on engines exposed to users. Again, this seems like a worthwhile
tradeoff.
CPU binds are required for ULLS to function, as the migration queue
needs exclusive access to the paging hardware engine. Thus, CPU binds
are included here. Beyond being a requirement for ULLS, CPU binds
should also reduce VM bind latency and decouple kernel binds from
unrelated copy/clear jobs—this is especially beneficial when faults are
serviced in parallel. Average bind time in a parallel faulting test case
was reduced by approximately 15µs-in the worst case, 2M copy time (~140µs)
* (number of page fault threads - 1) latency would be added to single
fault.
This series could be merged in phases: first CPU binds, then ULLS on the
migration execution queue.
Last couple of patches in series add modparams for quick performance /
power experiments.
v2:
- Use delayed worker to exit ULLS mode in an effort to save on power
- Various other cleanups
Matt
[1] https://patchwork.freedesktop.org/series/149811/
Matthew Brost (15):
drm/xe: Drop struct xe_migrate_pt_update argument from populate /
clear vfuns
drm/xe: Add __xe_migrate_update_pgtables_cpu helper
drm/xe: CPU binds for jobs
drm/xe: Remove unused arguments from xe_migrate_pt_update_ops
drm/xe: Don't use migrate exec queue for page fault binds
drm/xe: Do not create a VM bind queue per tile
drm/xe: Add xe_hw_engine_write_ring_tail
drm/xe: Add ULLS support to LRC
drm/xe: Add ULLS migration job support to migration layer
drm/xe: Add MI_SEMAPHORE_WAIT instruction defs
drm/xe: Add ULLS migration job support to ring ops
drm/xe: Add ULLS migration job support to GuC submission
drm/xe: Enter ULLS for migration jobs upon page fault or SVM prefetch
drm/xe: Add modparam to enable / disable ULLS on migrate queue
drm/xe: Add modparam to enable / disable high SLPC on migrate queue
.../gpu/drm/xe/instructions/xe_mi_commands.h | 6 +
drivers/gpu/drm/xe/tests/xe_migrate.c | 2 +
drivers/gpu/drm/xe/xe_bo.c | 7 +-
drivers/gpu/drm/xe/xe_bo.h | 9 +-
drivers/gpu/drm/xe/xe_bo_types.h | 2 -
drivers/gpu/drm/xe/xe_debugfs.c | 3 +
drivers/gpu/drm/xe/xe_device.c | 13 +-
drivers/gpu/drm/xe/xe_device_types.h | 10 +
drivers/gpu/drm/xe/xe_drm_client.c | 3 +-
drivers/gpu/drm/xe/xe_exec_queue.c | 43 +-
drivers/gpu/drm/xe/xe_exec_queue_types.h | 15 +-
drivers/gpu/drm/xe/xe_gt_pagefault.c | 2 +
drivers/gpu/drm/xe/xe_guc_submit.c | 58 +-
drivers/gpu/drm/xe/xe_hw_engine.c | 10 +
drivers/gpu/drm/xe/xe_hw_engine.h | 1 +
drivers/gpu/drm/xe/xe_lrc.c | 51 ++
drivers/gpu/drm/xe/xe_lrc.h | 3 +
drivers/gpu/drm/xe/xe_lrc_types.h | 4 +
drivers/gpu/drm/xe/xe_migrate.c | 544 +++++++++---------
drivers/gpu/drm/xe/xe_migrate.h | 24 +-
drivers/gpu/drm/xe/xe_module.c | 10 +
drivers/gpu/drm/xe/xe_module.h | 2 +
drivers/gpu/drm/xe/xe_pt.c | 221 +++++--
drivers/gpu/drm/xe/xe_pt.h | 5 +-
drivers/gpu/drm/xe/xe_pt_types.h | 29 +-
drivers/gpu/drm/xe/xe_ring_ops.c | 31 +
drivers/gpu/drm/xe/xe_sched_job.c | 78 ++-
drivers/gpu/drm/xe/xe_sched_job_types.h | 37 +-
drivers/gpu/drm/xe/xe_svm.c | 11 +
drivers/gpu/drm/xe/xe_vm.c | 99 ++--
drivers/gpu/drm/xe/xe_vm_types.h | 2 +-
31 files changed, 846 insertions(+), 489 deletions(-)
--
2.34.1
More information about the Intel-xe
mailing list