[PATCH v9 0/8] drm/xe/vf: Post-migration recovery of queues and jobs

Tomasz Lis tomasz.lis at intel.com
Sat Aug 2 03:10:37 UTC 2025


To support VF Migration, it is necessary to do fixups to any
non-virtualized resources. These fixups need to be applied within
VM, on the KMD working with VF.

This series adds two fixup functions to the recovery worker:
* for fixing xe_lrc structs within queues
* for fixing xe_job structs and the commands they emit
It also provides some performance and stability fixes - blocking
submissions and resets while the fixups are being applied.
In case of sub-allocator, it removes the cached GGTT addresses
instead of implementing fixups for them.

v2: Switcghed to update of addresses by xe_lrc_write_ctx_reg()
  to avoid kzalloc(), renamed or moved few functions
v3: Renamed and reordered parameters, added kerneldocs
v4: Take job_list_lock, introduce a new atomic for reset
  blocking, add "refresh utilization buffer" patch
v5: Replaced "Finish RESFIX by reset" patch with "Skip fixups
  before getting GGTT info", rebased "Refresh utilization buffer"
  patch
v6: Rebased to changes in "Make multi-GT migration less error prone",
  used a scratch buffer, and added one more ring recovery patch
v7: Used better matching atomic functs, fixed noop item at end of
  WQ ring, added more exit conditions in wq ring
v8: Improve error/warn logging, add propagation of errors,
 make enum for offsets
v9: Rephrased comments in one patch

Tomasz Lis (8):
  drm/xe/sa: Avoid caching GGTT address within the manager
  drm/xe/vf: Pause submissions during RESFIX fixups
  drm/xe: Block reset while recovering from VF migration
  drm/xe/vf: Rebase HWSP of all contexts after migration
  drm/xe/vf: Rebase MEMIRQ structures for all contexts after migration
  drm/xe/vf: Post migration, repopulate ring area for pending request
  drm/xe/vf: Refresh utilization buffer during migration recovery
  drm/xe/vf: Rebase exec queue parallel commands during migration
    recovery

 drivers/gpu/drm/xe/abi/guc_actions_abi.h |   8 ++
 drivers/gpu/drm/xe/xe_exec_queue.c       |  48 +++++++
 drivers/gpu/drm/xe/xe_exec_queue.h       |   4 +
 drivers/gpu/drm/xe/xe_gpu_scheduler.c    |  13 ++
 drivers/gpu/drm/xe/xe_gpu_scheduler.h    |   1 +
 drivers/gpu/drm/xe/xe_gt.c               |  10 ++
 drivers/gpu/drm/xe/xe_gt_debugfs.c       |   5 +-
 drivers/gpu/drm/xe/xe_gt_sriov_vf.c      |  14 ++
 drivers/gpu/drm/xe/xe_gt_sriov_vf.h      |   1 +
 drivers/gpu/drm/xe/xe_guc_buf.c          |   2 +-
 drivers/gpu/drm/xe/xe_guc_submit.c       | 175 +++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_guc_submit.h       |   9 ++
 drivers/gpu/drm/xe/xe_guc_types.h        |   6 +
 drivers/gpu/drm/xe/xe_lrc.c              | 107 ++++++++++++--
 drivers/gpu/drm/xe/xe_lrc.h              |   9 ++
 drivers/gpu/drm/xe/xe_sa.c               |   1 -
 drivers/gpu/drm/xe/xe_sa.h               |  15 +-
 drivers/gpu/drm/xe/xe_sa_types.h         |   1 -
 drivers/gpu/drm/xe/xe_sriov_vf.c         |  78 +++++++++-
 drivers/gpu/drm/xe/xe_sriov_vf_ccs.c     |   2 +-
 20 files changed, 490 insertions(+), 19 deletions(-)

-- 
2.25.1



More information about the Intel-xe mailing list