[PATCH 1/1] drm/xe: serialize store_data and user_interrupt for ufence wait
Matthew Brost
matthew.brost at intel.com
Fri Aug 15 21:08:29 UTC 2025
On Fri, Aug 15, 2025 at 03:03:33PM -0600, Yang, Fei wrote:
> > On Tue, Aug 12, 2025 at 11:28:46AM -0700, fei.yang at intel.com wrote:
> >> From: Fei Yang <fei.yang at intel.com>
> >>
> >> Quote BSpec, MI_STORE_DATA_IMM "simply initiates the write operation
> >> with command execution proceeding normally. Although the write
> >> operation is guaranteed to complete eventually, there is no mechanism
> >> to synchronize command execution with the completion (or even
> >> initiation) of these operations."
> >
> > Can we not just use update emit_store_imm_ppgtt_posted to use MI_FLUSH_IMM +
> > a PPGTT address? I think we can get rid of prior flush added in 3ad7d18c5dad7?
> >
> > According to bspec 45725 for MI_FLUSH_IMM:
> >
> > 'Usage note: After this command is completed with a Store DWord enabled, CPU
> > access to graphics memory willbe coherent (assuming the Render Cache flush is
> > not inhibited).'
>
> We already have a MI_FLUSH_DW in between the MI_STORE_DATA_IMM and
> MI_USER_INTERRUPT (See __emit_job_gen12_simple), but the issue was
> still reporduced.
>
That is not what I'm suggesting. I'm suggesting convert
emit_store_imm_ppgtt_posted to use a MI_FLUSH_DW intruction rather than
MI_STORE_DATA_IMM.
Matt
> static int emit_flush_imm_ggtt(u32 addr, u32 value, u32 flags, u32 *dw, int i)
> {
> dw[i++] = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_IMM_DW |
> flags;
> dw[i++] = addr | MI_FLUSH_DW_USE_GTT;
> dw[i++] = 0;
> dw[i++] = value;
>
> return i;
> }
>
> > Matt
> >
> >> The KMD currently emit MI_STORE_DATA_IMM and MI_USER_INTERRUPT
> >> consecutively to implement user fence. However, according to the
> >> BSpec, the data write operation is not guaranteed to be completed when
> >> triggering the interrupt, that would cause the
> >> xe_wait_user_fence_ioctl to wait until the full user specified timeout
> >> is reached before checking the fence value again. Great performance
> >> degradation has been observed in IGT xe_exec_fault_mode test cases due
> >> to this unnecessary wait. The worst case is that if user set the
> >> timeout to MAX_INT32, the wait could end up being a hang until some other random program triggers a user interrupt to wake it up.
> >> A semaphore wait is added right after the data write to avoid the
> >> unexpected wait.
> >>
> >> Signed-off-by: Fei Yang <fei.yang at intel.com>
> >> ---
> >> drivers/gpu/drm/xe/instructions/xe_mi_commands.h | 11 +++++++++++
> >> drivers/gpu/drm/xe/xe_ring_ops.c | 14 ++++++++++++++
> >> 2 files changed, 25 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> >> b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> >> index c47b290e0e9f..1c9e7b35c665 100644
> >> --- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> >> +++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> >> @@ -34,6 +34,17 @@
> >> #define MI_FORCE_WAKEUP __MI_INSTR(0x1D)
> >> #define MI_MATH(n) (__MI_INSTR(0x1A) | XE_INSTR_NUM_DW((n) + 1))
> >>
> >> +#define MI_SEMAPHORE_WAIT (__MI_INSTR(0x1c) | XE_INSTR_NUM_DW(5))
> >> +#define MI_SEMAPHORE_REGISTER_POLL REG_BIT(16)
> >> +#define MI_SEMAPHORE_POLL REG_BIT(15)
> >> +#define MI_SEMAPHORE_COMP_OP GENMASK(14, 12)
> >> +#define MI_SEMAPHORE_SAD_GT_SDD REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 0)
> >> +#define MI_SEMAPHORE_SAD_GTE_SDD REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 1)
> >> +#define MI_SEMAPHORE_SAD_LT_SDD REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 2)
> >> +#define MI_SEMAPHORE_SAD_LTE_SDD REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 3)
> >> +#define MI_SEMAPHORE_SAD_EQ_SDD REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 4)
> >> +#define MI_SEMAPHORE_SAD_NEQ_SDD REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 5)
> >> +
> >> #define MI_STORE_DATA_IMM __MI_INSTR(0x20)
> >> #define MI_SDI_GGTT REG_BIT(22)
> >> #define MI_SDI_LEN_DW GENMASK(9, 0)
> >> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
> >> b/drivers/gpu/drm/xe/xe_ring_ops.c
> >> index 5f15360d14bf..189e764e3914 100644
> >> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> >> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> >> @@ -169,6 +169,20 @@ static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
> >> dw[i++] = upper_32_bits(addr);
> >> dw[i++] = lower_32_bits(value);
> >> dw[i++] = upper_32_bits(value);
> >> + dw[i++] = MI_SEMAPHORE_WAIT |
> >> + MI_SEMAPHORE_POLL |
> >> + MI_SEMAPHORE_SAD_EQ_SDD;
> >> + dw[i++] = lower_32_bits(value);
> >> + dw[i++] = lower_32_bits(addr);
> >> + dw[i++] = upper_32_bits(addr);
> >> + dw[i++] = 0;
> >> + dw[i++] = MI_SEMAPHORE_WAIT |
> >> + MI_SEMAPHORE_POLL |
> >> + MI_SEMAPHORE_SAD_EQ_SDD;
> >> + dw[i++] = upper_32_bits(value);
> >> + dw[i++] = lower_32_bits(addr + 4);
> >> + dw[i++] = upper_32_bits(addr);
> >> + dw[i++] = 0;
> >>
> >> return i;
> >> }
> >> --
> >> 2.43.0
> >>
More information about the Intel-xe
mailing list