✗ CI.checkpatch: warning for Maintenence of devcoredump <-> GuC-Err-Capture plumbing
Patchwork
patchwork at emeril.freedesktop.org
Tue Jan 21 20:52:28 UTC 2025
== Series Details ==
Series: Maintenence of devcoredump <-> GuC-Err-Capture plumbing
URL : https://patchwork.freedesktop.org/series/143809/
State : warning
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
30ab6715fc09baee6cc14cb3c89ad8858688d474
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit a75ea34c20eb29eb7135ac17b6ec3e2209027f61
Author: Alan Previn <alan.previn.teres.alexis at intel.com>
Date: Tue Jan 21 11:09:35 2025 -0800
drm/xe/guc/capture: Maintenence of devcoredump <-> GuC-Err-Capture plumbing
The order of the devcoredump event flow is:
drm-scheduler -> guc-submission-execq-timed-out-job ->
guc-submission-kill-job -> xe-devcoredump (once the work
is confirmed to have been killed).
As we are aware, the GuC-FW IRQ for error-capture delivery
and extraction could have happened before the start of
guc-execq-timed-out-job or the middle of it (before or
during the explicit kill) or not at all. Thus, today, the
above flow takes a manual capture first before triggering
the kill-job just in case we need it.
The structure layering of devcoredump internals are:
xe_devcoredump_snapshot -> xe_foo_snapshot (where foo
can be any data dump associated to the job was killed).
Foo includes the xe_hw_engine_snapshot. Since GuC-Error-Capture
provides just the register dump of an engine, GuC-Err-Capture
snapshots should be managed by the xe_hw_engine_snapshot.
That isn't the case today.
Furthermore, neither xe_devcoredump_snapshot nor
xe_hw_engine_snapshot even exists at the start of
guc-submission-execq-timed-out-job. Thus, the first
manual capture node has no home. However, today,
GuC-Error-Capture stores capture snapshots off the
top-level xe_devcoredump_snapshot's matched_node.
GuC-Error-Capture also had absorbed the function for
xe_hw_engine_snapshot generation.
NOTE: Existing code isn't broken because xe_devcoredump
is not dynamically allocated and designed to hold a
single event at a time (i.e. single engine dump).
But it's not scalable for future improvement.
Thus this patch:
1. Moves "matched_node" from xe_devcoredump_snapshot to
xe_hw_engine_snapshot.
2. Relocates the functions for xe_hw_engine_snapshot generation
and printing back to xe_hw_engine.c. However, split out the
register dump printing so it stays within GuC-Error-Capture
(so we don't need to maintain two sets of register lists).
3. Keep both the manual and firmware capture nodes within
GuC-Error-Capture subsystem's linked list until
xe_hw_engine_snapshot gets and puts them later.
4. Give xe_hw_engine_snapshot the control and ability to
query GuC-Error-Capture for matching snapshots while choosing
between manual vs firmware capture getting/putting node.
5. While at it, relocate (and rename) key structures, enums
and function protos to xe_guc_capture_snapshot_types.h
(as an inter-module header) for xe_hw_engine_snapshot to use.
6. Since xe_hw_engine_snapshot can also be called by via debugfs
without a job, create a new function that does a manual capture
of engine registers without any associated job.
v4: Rebase to latest drm-xe-next
v3: Fix check on queue handle when getting manual capture (CI-test)
v2: Bail on manual capture when running on a VF (Zhanjun)
Signed-off-by: Alan Previn <alan.previn.teres.alexis at intel.com>
Reviewed-by: Zhanjun Dong <zhanjun.dong at intel.com>
+ /mt/dim checkpatch 400d5a3912504bf740150101c9e5884eae2630ea drm-intel
a75ea34c20eb drm/xe/guc/capture: Maintenence of devcoredump <-> GuC-Err-Capture plumbing
-:808: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#808:
new file mode 100644
total: 0 errors, 1 warnings, 0 checks, 1012 lines checked
More information about the Intel-xe
mailing list