[PATCH 1/1] drm/xe: serialize store_data and user_interrupt for ufence wait

Yang, Fei fei.yang at intel.com
Fri Aug 15 21:03:33 UTC 2025


> On Tue, Aug 12, 2025 at 11:28:46AM -0700, fei.yang at intel.com wrote:
>> From: Fei Yang <fei.yang at intel.com>
>>
>> Quote BSpec, MI_STORE_DATA_IMM "simply initiates the write operation
>> with command execution proceeding normally. Although the write
>> operation is guaranteed to complete eventually, there is no mechanism
>> to synchronize command execution with the completion (or even
>> initiation) of these operations."
>
> Can we not just use update emit_store_imm_ppgtt_posted to use MI_FLUSH_IMM +
> a PPGTT address? I think we can get rid of prior flush added in 3ad7d18c5dad7?
>
> According to bspec 45725 for MI_FLUSH_IMM:
>
> 'Usage note: After this command is completed with a Store DWord enabled, CPU
> access to graphics memory willbe coherent (assuming the Render Cache flush is
> not inhibited).'

We already have a MI_FLUSH_DW in between the MI_STORE_DATA_IMM and
MI_USER_INTERRUPT (See __emit_job_gen12_simple), but the issue was
still reporduced.

static int emit_flush_imm_ggtt(u32 addr, u32 value, u32 flags, u32 *dw, int i)
{
        dw[i++] = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_IMM_DW |
                  flags;
        dw[i++] = addr | MI_FLUSH_DW_USE_GTT;
        dw[i++] = 0;
        dw[i++] = value;

        return i;
}

> Matt
>
>> The KMD currently emit MI_STORE_DATA_IMM and MI_USER_INTERRUPT
>> consecutively to implement user fence. However, according to the
>> BSpec, the data write operation is not guaranteed to be completed when
>> triggering the interrupt, that would cause the
>> xe_wait_user_fence_ioctl to wait until the full user specified timeout
>> is reached before checking the fence value again. Great performance
>> degradation has been observed in IGT xe_exec_fault_mode test cases due
>> to this unnecessary wait. The worst case is that if user set the
>> timeout to MAX_INT32, the wait could end up being a hang until some other random program triggers a user interrupt to wake it up.
>> A semaphore wait is added right after the data write to avoid the
>> unexpected wait.
>>
>> Signed-off-by: Fei Yang <fei.yang at intel.com>
>> ---
>>  drivers/gpu/drm/xe/instructions/xe_mi_commands.h | 11 +++++++++++
>>  drivers/gpu/drm/xe/xe_ring_ops.c                 | 14 ++++++++++++++
>>  2 files changed, 25 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
>> b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
>> index c47b290e0e9f..1c9e7b35c665 100644
>> --- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
>> +++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
>> @@ -34,6 +34,17 @@
>>  #define MI_FORCE_WAKEUP                     __MI_INSTR(0x1D)
>>  #define MI_MATH(n)                  (__MI_INSTR(0x1A) | XE_INSTR_NUM_DW((n) + 1))
>>
>> +#define MI_SEMAPHORE_WAIT           (__MI_INSTR(0x1c) | XE_INSTR_NUM_DW(5))
>> +#define   MI_SEMAPHORE_REGISTER_POLL        REG_BIT(16)
>> +#define   MI_SEMAPHORE_POLL         REG_BIT(15)
>> +#define   MI_SEMAPHORE_COMP_OP              GENMASK(14, 12)
>> +#define   MI_SEMAPHORE_SAD_GT_SDD   REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 0)
>> +#define   MI_SEMAPHORE_SAD_GTE_SDD  REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 1)
>> +#define   MI_SEMAPHORE_SAD_LT_SDD   REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 2)
>> +#define   MI_SEMAPHORE_SAD_LTE_SDD  REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 3)
>> +#define   MI_SEMAPHORE_SAD_EQ_SDD   REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 4)
>> +#define   MI_SEMAPHORE_SAD_NEQ_SDD  REG_FIELD_PREP(MI_SEMAPHORE_COMP_OP, 5)
>> +
>>  #define MI_STORE_DATA_IMM           __MI_INSTR(0x20)
>>  #define   MI_SDI_GGTT                       REG_BIT(22)
>>  #define   MI_SDI_LEN_DW                     GENMASK(9, 0)
>> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
>> b/drivers/gpu/drm/xe/xe_ring_ops.c
>> index 5f15360d14bf..189e764e3914 100644
>> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
>> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
>> @@ -169,6 +169,20 @@ static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
>>      dw[i++] = upper_32_bits(addr);
>>      dw[i++] = lower_32_bits(value);
>>      dw[i++] = upper_32_bits(value);
>> +    dw[i++] = MI_SEMAPHORE_WAIT |
>> +              MI_SEMAPHORE_POLL |
>> +              MI_SEMAPHORE_SAD_EQ_SDD;
>> +    dw[i++] = lower_32_bits(value);
>> +    dw[i++] = lower_32_bits(addr);
>> +    dw[i++] = upper_32_bits(addr);
>> +    dw[i++] = 0;
>> +    dw[i++] = MI_SEMAPHORE_WAIT |
>> +              MI_SEMAPHORE_POLL |
>> +              MI_SEMAPHORE_SAD_EQ_SDD;
>> +    dw[i++] = upper_32_bits(value);
>> +    dw[i++] = lower_32_bits(addr + 4);
>> +    dw[i++] = upper_32_bits(addr);
>> +    dw[i++] = 0;
>>
>>      return i;
>>  }
>> --
>> 2.43.0
>>


More information about the Intel-xe mailing list