✗ CI.checkpatch: warning for series starting with [1/3] drm/xe/trace: improve xe_sched_msg trace
Patchwork
patchwork at emeril.freedesktop.org
Fri Nov 22 16:28:01 UTC 2024
== Series Details ==
Series: series starting with [1/3] drm/xe/trace: improve xe_sched_msg trace
URL : https://patchwork.freedesktop.org/series/141705/
State : warning
== Summary ==
+ KERNEL=/kernel
+ git clone https://gitlab.freedesktop.org/drm/maintainer-tools mt
Cloning into 'mt'...
warning: redirecting to https://gitlab.freedesktop.org/drm/maintainer-tools.git/
+ git -C mt rev-list -n1 origin/master
30ab6715fc09baee6cc14cb3c89ad8858688d474
+ cd /kernel
+ git config --global --add safe.directory /kernel
+ git log -n1
commit 553114143726c4e3493b19c1d26e1bcdb53b2411
Author: Matthew Auld <matthew.auld at intel.com>
Date: Fri Nov 22 16:19:17 2024 +0000
drm/xe/guc_submit: fix race around suspend_pending
Currently in some testcases we can trigger:
xe 0000:03:00.0: [drm] Assertion `exec_queue_destroyed(q)` failed!
....
WARNING: CPU: 18 PID: 2640 at drivers/gpu/drm/xe/xe_guc_submit.c:1826 xe_guc_sched_done_handler+0xa54/0xef0 [xe]
xe 0000:03:00.0: [drm] *ERROR* GT1: DEREGISTER_DONE: Unexpected engine state 0x00a1, guc_id=57
Looking at a snippet of corresponding ftrace for this GuC id we can see:
162.673311: xe_sched_msg_add: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3
162.673317: xe_sched_msg_recv: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3
162.673319: xe_exec_queue_scheduling_disable: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0
162.674089: xe_exec_queue_kill: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0
162.674108: xe_exec_queue_close: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0
162.674488: xe_exec_queue_scheduling_done: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0
162.678452: xe_exec_queue_deregister: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa1, flags=0x0
It looks like we try to suspend the queue (opcode=3), setting
suspend_pending and triggering a disable_scheduling. The user then
closes the queue. However closing the queue seems to forcefully signal
the fence after killing the queue, however when the G2H response for
disable_scheduling comes back we have now cleared suspend_pending when
signalling the suspend fence, so the disable_scheduling now incorrectly
tries to also deregister the queue, leading to warnings since the queue
has yet to even be marked for destruction. We also seem to trigger
errors later with trying to double unregister the same queue.
To fix this tweak the ordering when handling the response to ensure we
don't race with a disable_scheduling that doesn't actually intend to
actually unregister. The destruction path should now also correctly
wait for any pending_disable before marking as destroyed.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3371
Signed-off-by: Matthew Auld <matthew.auld at intel.com>
Cc: Matthew Brost <matthew.brost at intel.com>
Cc: <stable at vger.kernel.org> # v6.8+
+ /mt/dim checkpatch a381faddbfc974e7bd57efe953a738415afccd6a drm-intel
1cbe4629b071 drm/xe/trace: improve xe_sched_msg trace
-:33: WARNING:LONG_LINE: line length of 116 exceeds 100 columns
#33: FILE: drivers/gpu/drm/xe/xe_trace.h:299:
+ TP_printk("dev=%s, gt=%u guc_id=%d, opcode=%u", __get_str(dev), __entry->gt_id, __entry->guc_id,
total: 0 errors, 1 warnings, 0 checks, 19 lines checked
81e3dd175070 drm/xe/guc_submit: fix race around pending_disable
-:8: WARNING:COMMIT_LOG_LONG_LINE: Prefer a maximum 75 chars per line (possible unwrapped commit description?)
#8:
[drm] *ERROR* GT0: SCHED_DONE: Unexpected engine state 0x02b1, guc_id=8, runnable_state=0
-:83: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open parenthesis
#83: FILE: drivers/gpu/drm/xe/xe_guc_submit.c:1337:
+ wait_event(guc->ct.wq, (q->guc->resume_time != RESUME_PENDING ||
+ xe_guc_read_stopped(guc)) && !exec_queue_pending_disable(q));
total: 0 errors, 1 warnings, 1 checks, 37 lines checked
553114143726 drm/xe/guc_submit: fix race around suspend_pending
-:15: WARNING:COMMIT_LOG_LONG_LINE: Prefer a maximum 75 chars per line (possible unwrapped commit description?)
#15:
162.673311: xe_sched_msg_add: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3
total: 0 errors, 1 warnings, 0 checks, 32 lines checked
More information about the Intel-xe
mailing list