[RFC v1 0/9] Parallel submission of dma fence jobs and LR jobs with shared hardware resources

Francois Dugast francois.dugast at intel.com
Wed Jul 17 13:07:21 UTC 2024


Currently Xe KMD only allows either all VMs on the device to be page-faulting
VMs, or none of them to be page-faulting VMs. This prevents page-faulting
workloads from waiting for a dma-fence in the fault handler, as the page fault
would then hold the execution resources, which means the dma-fence would never
signal and this would create a deadlock.

This limitation in the driver prevents mixing dma-fence jobs and long-running
faulting jobs, for example if an application would submit 3D jobs for the
compositor but also SVM compute jobs on the same device. To safely lift this
restriction, a finer approach is introduced in this series.

Hardware engines which share resources and would block each other are assigned
to the same hardware engine group. This group ensures mutual exclusion of the
execution of dma fence jobs and long running jobs on the shared hardware
resources.

If a long running job is executing when a dma fence job is submitted, the long
running job is preempted, the dma fence job executes, then the long running
job is resumed. If a dma fence job is executing when a long running job is
submitted, we wait for completion of the dma fence job before executing the
long running job.

This has been tested on PVC with new IGT tests [1].

[1] https://patchwork.freedesktop.org/series/136191/

Francois Dugast (9):
  drm/xe/hw_engine_group: Introduce xe_hw_engine_group
  drm/xe/exec_queue: Add list link for the hw engine group
  drm/xe/hw_engine_group: Register hw engine group's exec queues
  drm/xe/hw_engine_group: Add helper to suspend LR jobs
  drm/xe/hw_engine_group: Add helper to wait for dma fence jobs
  drm/xe/hw_engine_group: Ensure safe transition between execution modes
  drm/xe/exec: Switch hw engine group execution mode upon job submission
  drm/xe/hw_engine_group: Resume LR exec queues suspended by dma fence
    jobs
  drm/xe/vm: Remove restriction that all VMs must be faulting if one is

 drivers/gpu/drm/xe/xe_device.h           |  10 -
 drivers/gpu/drm/xe/xe_exec.c             |  14 +-
 drivers/gpu/drm/xe/xe_exec_queue.c       |   7 +
 drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
 drivers/gpu/drm/xe/xe_hw_engine.c        | 256 +++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_hw_engine.h        |   9 +
 drivers/gpu/drm/xe/xe_hw_engine_types.h  |  31 +++
 drivers/gpu/drm/xe/xe_vm.c               |   8 -
 8 files changed, 318 insertions(+), 19 deletions(-)

-- 
2.43.0



More information about the Intel-xe mailing list