[RFC v2 0/6] Add support for Mesa GPU hang replay tool

Carlos Santa carlos.santa at intel.corp-partner.google.com
Tue Feb 11 02:20:00 UTC 2025


Add support for the Mesa GPU hang replay tool, which exists in the i915.

The main changes are as follows:

- Update devcoredump to include additional information, allowing the
  Mesa tool to extract everything it needs to replay a GPU hang. These
  updates are designed to remain compatible with the existing Mesa
  devcoredump parser.
- Introduce the DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE extension, which
  enables setting the execution queue state to the hung execution queue
  state.

V2
- Enable the flag DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE
- Fix the page math to avoid a crash

This is being sent as an RFC, as the Mesa uAPI tool has yet to be developed. The tool is a prerequisite for merging this change.

Matt

Matthew Brost (6):
  drm/xe: Add properties line to VM snapshot capture
  drm/xe: Add "null_sparse" type to VM snap properties
  drm/xe: Add mem_region to properties line in VM snapshot capture
  drm/xe/uapi: Add DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE
  drm/xe: Add replay_offset and replay_length lines to LRC HWCTX
    snapshot
  drm/xe: Implement DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE

 drivers/gpu/drm/xe/xe_exec_queue.c       | 32 +++++++++++++++--
 drivers/gpu/drm/xe/xe_exec_queue_types.h |  3 ++
 drivers/gpu/drm/xe/xe_execlist.c         |  2 +-
 drivers/gpu/drm/xe/xe_lrc.c              | 44 +++++++++++++++++++-----
 drivers/gpu/drm/xe/xe_lrc.h              |  4 ++-
 drivers/gpu/drm/xe/xe_lrc_types.h        |  3 ++
 drivers/gpu/drm/xe/xe_vm.c               | 42 +++++++++++++++++++++-
 include/uapi/drm/xe_drm.h                |  9 +++--
 8 files changed, 124 insertions(+), 15 deletions(-)

-- 
2.43.0



More information about the Intel-xe mailing list