[PATCH 1/1] drm/xe: serialize store_data and user_interrupt for ufence wait

fei.yang at intel.com fei.yang at intel.com
Tue Aug 12 18:28:46 UTC 2025


From: Fei Yang <fei.yang at intel.com>

Quote BSpec, MI_STORE_DATA_IMM "simply initiates the write operation with
command execution proceeding normally. Although the write operation is
guaranteed to complete eventually, there is no mechanism to synchronize
command execution with the completion (or even initiation) of these
operations."
The KMD currently emit MI_STORE_DATA_IMM and MI_USER_INTERRUPT consecutively
to implement user fence. However, according to the BSpec, the data write
operation is not guaranteed to be completed when triggering the interrupt,
that would cause the xe_wait_user_fence_ioctl to wait until the full user
specified timeout is reached before checking the fence value again. Great
performance degradation has been observed in IGT xe_exec_fault_mode test
cases due to this unnecessary wait. The worst case is that if user set the
timeout to MAX_INT32, the wait could end up being a hang until some other
random program triggers a user interrupt to wake it up.
A semaphore wait is added right after the data write to avoid the unexpected
wait.

Signed-off-by: Fei Yang <fei.yang at intel.com>
---
 drivers/gpu/drm/xe/instructions/xe_mi_commands.h | 11 +++++++++++
 drivers/gpu/drm/xe/xe_ring_ops.c                 | 14 ++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
index c47b290e0e9f..1c9e7b35c665 100644
--- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
@@ -34,6 +34,17 @@
 #define MI_FORCE_WAKEUP			__MI_INSTR(0x1D)
 #define MI_MATH(n)			(__MI_INSTR(0x1A) | XE_INSTR_NUM_DW((n) + 1))
 
+#define MI_SEMAPHORE_WAIT		(__MI_INSTR(0x1c) | XE_INSTR_NUM_DW(5))
+#define   MI_SEMAPHORE_REGISTER_POLL	REG_BIT(16)
+#define   MI_SEMAPHORE_POLL		REG_BIT(15)
+#define   MI_SEMAPHORE_COMP_OP		GENMASK(14, 12)
+#define   MI_SEMAPHORE_SAD_GT_SDD	REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 0)
+#define   MI_SEMAPHORE_SAD_GTE_SDD	REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 1)
+#define   MI_SEMAPHORE_SAD_LT_SDD	REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 2)
+#define   MI_SEMAPHORE_SAD_LTE_SDD	REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 3)
+#define   MI_SEMAPHORE_SAD_EQ_SDD	REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 4)
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 5)
+
 #define MI_STORE_DATA_IMM		__MI_INSTR(0x20)
 #define   MI_SDI_GGTT			REG_BIT(22)
 #define   MI_SDI_LEN_DW			GENMASK(9, 0)
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 5f15360d14bf..189e764e3914 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -169,6 +169,20 @@ static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
 	dw[i++] = upper_32_bits(addr);
 	dw[i++] = lower_32_bits(value);
 	dw[i++] = upper_32_bits(value);
+	dw[i++] = MI_SEMAPHORE_WAIT |
+		  MI_SEMAPHORE_POLL |
+		  MI_SEMAPHORE_SAD_EQ_SDD;
+	dw[i++] = lower_32_bits(value);
+	dw[i++] = lower_32_bits(addr);
+	dw[i++] = upper_32_bits(addr);
+	dw[i++] = 0;
+	dw[i++] = MI_SEMAPHORE_WAIT |
+		  MI_SEMAPHORE_POLL |
+		  MI_SEMAPHORE_SAD_EQ_SDD;
+	dw[i++] = upper_32_bits(value);
+	dw[i++] = lower_32_bits(addr + 4);
+	dw[i++] = upper_32_bits(addr);
+	dw[i++] = 0;
 
 	return i;
 }
-- 
2.43.0



More information about the Intel-xe mailing list