[PATCH 1/2] drm/xe: Process deferred GGTT node removals on device unwind
Michal Wajdeczko
michal.wajdeczko at intel.com
Thu Jun 12 22:09:36 UTC 2025
While we are indirectly draining our dedicated workqueue ggtt->wq
that we use to complete asynchronous removal of some GGTT nodes,
this happends as part of the managed-drm unwinding (ggtt_fini_early),
which could be later then manage-device unwinding, where we could
already unmap our MMIO/GMS mapping (mmio_fini).
This was recently observed during unsuccessful VF initialization:
[ ] xe 0000:00:02.1: probe with driver xe failed with error -62
[ ] xe 0000:00:02.1: DEVRES REL ffff88811e747340 __xe_bo_unpin_map_no_vm (16 bytes)
[ ] xe 0000:00:02.1: DEVRES REL ffff88811e747540 __xe_bo_unpin_map_no_vm (16 bytes)
[ ] xe 0000:00:02.1: DEVRES REL ffff88811e747240 __xe_bo_unpin_map_no_vm (16 bytes)
[ ] xe 0000:00:02.1: DEVRES REL ffff88811e747040 tiles_fini (16 bytes)
[ ] xe 0000:00:02.1: DEVRES REL ffff88811e746840 mmio_fini (16 bytes)
[ ] xe 0000:00:02.1: DEVRES REL ffff88811e747f40 xe_bo_pinned_fini (16 bytes)
[ ] xe 0000:00:02.1: DEVRES REL ffff88811e746b40 devm_drm_dev_init_release (16 bytes)
[ ] xe 0000:00:02.1: [drm:drm_managed_release] drmres release begin
[ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef81640 __fini_relay (8 bytes)
[ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80d40 guc_ct_fini (8 bytes)
[ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80040 __drmm_mutex_release (8 bytes)
[ ] xe 0000:00:02.1: [drm:drm_managed_release] REL ffff88810ef80140 ggtt_fini_early (8 bytes)
and this was leading to:
[ ] BUG: unable to handle page fault for address: ffffc900058162a0
[ ] #PF: supervisor write access in kernel mode
[ ] #PF: error_code(0x0002) - not-present page
[ ] Oops: Oops: 0002 [#1] SMP NOPTI
[ ] Tainted: [W]=WARN
[ ] Workqueue: xe-ggtt-wq ggtt_node_remove_work_func [xe]
[ ] RIP: 0010:xe_ggtt_set_pte+0x6d/0x350 [xe]
[ ] Call Trace:
[ ] <TASK>
[ ] xe_ggtt_clear+0xb0/0x270 [xe]
[ ] ggtt_node_remove+0xbb/0x120 [xe]
[ ] ggtt_node_remove_work_func+0x30/0x50 [xe]
[ ] process_one_work+0x22b/0x6f0
[ ] worker_thread+0x1e8/0x3d
Add managed-device action that will explicitly drain the workqueue
with all pending node removals prior to releasing MMIO/GSM mapping.
Fixes: 919bb54e989c ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
Cc: Lucas De Marchi <lucas.demarchi at intel.com>
---
drivers/gpu/drm/xe/xe_ggtt.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index 7b11fa1356f0..a8830cdb185f 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -238,6 +238,13 @@ int xe_ggtt_init_kunit(struct xe_ggtt *ggtt, u32 reserved, u32 size)
}
EXPORT_SYMBOL_IF_KUNIT(xe_ggtt_init_kunit);
+static void dev_fini_ggtt(void *arg)
+{
+ struct xe_ggtt *ggtt = arg;
+
+ drain_workqueue(ggtt->wq);
+}
+
/**
* xe_ggtt_init_early - Early GGTT initialization
* @ggtt: the &xe_ggtt to be initialized
@@ -290,6 +297,10 @@ int xe_ggtt_init_early(struct xe_ggtt *ggtt)
if (err)
return err;
+ err = devm_add_action_or_reset(xe->drm.dev, dev_fini_ggtt, ggtt);
+ if (err)
+ return err;
+
if (IS_SRIOV_VF(xe)) {
err = xe_tile_sriov_vf_prepare_ggtt(ggtt->tile);
if (err)
--
2.47.1
More information about the Intel-xe
mailing list