[PATCH v7 0/6] Maintenence of devcoredump <-> GuC-Err-Capture plumbing
Alan Previn
alan.previn.teres.alexis at intel.com
Mon Feb 10 23:32:48 UTC 2025
The GuC-Error-Capture is currently reaching into xe_devcoredump
structure to store its own place-holder snaphot-ptr to workaround
the race between G2H-Error-Capture-Notification vs Drm-Scheduler
triggering GuC-Submission-exec-queue-timeout/kill.
>From a subsystem layering perspective, this isn't scalable as
GuC should not be manipulating contents of a global structure it
does not own when responding to an unrelated thread / callstack.
Also, part of the earlier mentioned workaround includes the
GuC-Error-Capture taking on one of the front-end functions
for xe_hw_engine_snapshot generation because of an orthogonal
debugfs-caller requesting raw dumps of engine registers without
a job. This request is better handled by GuC-Error-Capture since
there is a lot to manage for reading and printing engine
register lists and we want to avoid duplicate code or tables.
However, logically speaking, the GuC-Error-Capture output node
is really a subset of xe_hw_engine_snapshot. This is irregardless
of the fact that the majority of an engine-snapshot is the
register dumps that only the GuC-Error-Capture can do.
That said, this series intends to refactor the plumbing between
Guc-Error-Capture and xe_devcoredump (including
xe_hw_engine_snapshot) to fix the layering for future
maintenence and scalability. This is done without changing
any functionality and IP-locality (i.e. GuC-Error-Capture still owns
the single point of engine register list definition and printing).
This series ensures 'xe_devcoredump_snapshot' owns
'xe_hw_engine_snapshot generation' and the latter owns
'xe_guc_capture_snapshot' retrieval (with GuC-Error-Capture
as its helper).
Alan Previn (6):
drm/xe/guc: Rename __guc_capture_parsed_output
drm/xe/guc: Don't store capture nodes in xe_devcoredump_snapshot
drm/xe/guc: Split engine state print between xe_hw_engine vs
xe_guc_capture
drm/xe/guc: Move xe_hw_engine_snapshot creation back to xe_hw_engine.c
drm/xe/xe_hw_engine: Update xe_hw_engine capture for debugfs/gt_reset
drm/xe/guc: Update comments on GuC-Err-Capture flows
drivers/gpu/drm/xe/xe_devcoredump.c | 7 +-
drivers/gpu/drm/xe/xe_devcoredump_types.h | 6 -
drivers/gpu/drm/xe/xe_guc_capture.c | 381 ++++++++----------
drivers/gpu/drm/xe/xe_guc_capture.h | 16 +-
.../drm/xe/xe_guc_capture_snapshot_types.h | 57 +++
drivers/gpu/drm/xe/xe_guc_submit.c | 12 +-
drivers/gpu/drm/xe/xe_hw_engine.c | 114 ++++--
drivers/gpu/drm/xe/xe_hw_engine.h | 4 +-
drivers/gpu/drm/xe/xe_hw_engine_types.h | 13 +-
9 files changed, 337 insertions(+), 273 deletions(-)
create mode 100644 drivers/gpu/drm/xe/xe_guc_capture_snapshot_types.h
base-commit: f74fd53ba34551b7626193fb70c17226f06e9bf1
--
2.34.1
More information about the dri-devel
mailing list