[PATCH] drm/xe: Remove 'Force write completion' from MI_STORE_DATA_IMM

Dixit, Ashutosh ashutosh.dixit at intel.com
Tue Dec 24 03:08:09 UTC 2024


On Mon, 23 Dec 2024 10:29:18 -0800, José Roberto de Souza wrote:
>
> In all places the MI_STORE_DATA_IMM are not followed by a read of
> the same memory address in the same batch buffer and the posted writes
> are flushed with PIPE_CONTROL or MI_FLUSH_DW in xe_ring_ops.c functions
> so there is no need to set this register.
>
> This is not a revert because I think it is worthy keep the register
> definition and the comment on top of it so future readers don't
> get the same wrong conclusion as I did.
>
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Cc: Ashutosh Dixit <ashutosh.dixit at intel.com>
> Fixes: 1460bb1fef9c ("drm/xe: Force write completion of MI_STORE_DATA_IMM")
> Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> ---
>  drivers/gpu/drm/xe/instructions/xe_mi_commands.h | 4 ++++
>  drivers/gpu/drm/xe/xe_migrate.c                  | 9 ++-------
>  drivers/gpu/drm/xe/xe_oa.c                       | 1 -
>  drivers/gpu/drm/xe/xe_ring_ops.c                 | 6 ++----
>  4 files changed, 8 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> index f4ee910f09432..8e9884aaa4b8a 100644
> --- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> +++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> @@ -35,6 +35,10 @@
>
>  #define MI_STORE_DATA_IMM			__MI_INSTR(0x20)
>  #define   MI_SDI_GGTT				REG_BIT(22)
> +/*
> + * PIPE_CONTROL and MI_FLUSH_DW flushes all posted writes, only needed if
> + * written value is read in batch buffer

I don't think this is correct. If a previous posted write writes data and
that data is lying in GPU cache, this or a future batch buffer should be
able to read that data, even without this flag.

The purpose of MI_FORCE_WRITE_COMPLETION_CHECK flag seems to be a sort of
SFENCE for the GPU, and SFENCE equivalent can also be done using
MI_FLUSH_DW and PIPE_CONTROL. It is the latter which seem to be used in the
driver now, rather than this flag. So the purpose is to make the writes
globally visible.

So if you want to leave a comment here you'd have to say something like
that and drop "only needed if written value is read in batch
buffer". So something like:

/*
 * At present PIPE_CONTROL and MI_FLUSH_DW flush posted writes (when needed)
 * in the Xe driver. This flag provides another means of doing the same.
 */

Otherwise just revert the patch. Thomas can review the wording.

Also the OA instance has or will be been soon deleted so is no longer
relevant.

Sorry, I should have been more careful while reviewing.

Thanks.
--
Ashutosh

> + */
>  #define   MI_FORCE_WRITE_COMPLETION_CHECK	REG_BIT(10)
>  #define   MI_SDI_LEN_DW				GENMASK(9, 0)
>  #define   MI_SDI_NUM_DW(x)			REG_FIELD_PREP(MI_SDI_LEN_DW, (x) + 3 - 2)
> diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
> index 8b32fad678782..9b3bcb789eee3 100644
> --- a/drivers/gpu/drm/xe/xe_migrate.c
> +++ b/drivers/gpu/drm/xe/xe_migrate.c
> @@ -581,9 +581,7 @@ static void emit_pte(struct xe_migrate *m,
>	while (ptes) {
>		u32 chunk = min(MAX_PTE_PER_SDI, ptes);
>
> -		bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> -				    MI_FORCE_WRITE_COMPLETION_CHECK |
> -				    MI_SDI_NUM_QW(chunk);
> +		bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk);
>		bb->cs[bb->len++] = ofs;
>		bb->cs[bb->len++] = 0;
>
> @@ -1225,9 +1223,7 @@ static void write_pgtable(struct xe_tile *tile, struct xe_bb *bb, u64 ppgtt_ofs,
>		if (!(bb->len & 1))
>			bb->cs[bb->len++] = MI_NOOP;
>
> -		bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> -				    MI_FORCE_WRITE_COMPLETION_CHECK |
> -				    MI_SDI_NUM_QW(chunk);
> +		bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk);
>		bb->cs[bb->len++] = lower_32_bits(addr);
>		bb->cs[bb->len++] = upper_32_bits(addr);
>		if (pt_op->bind)
> @@ -1392,7 +1388,6 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
>			u32 idx = 0;
>
>			bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> -					    MI_FORCE_WRITE_COMPLETION_CHECK |
>					    MI_SDI_NUM_QW(chunk);
>			bb->cs[bb->len++] = ofs;
>			bb->cs[bb->len++] = 0; /* upper_32_bits */
> diff --git a/drivers/gpu/drm/xe/xe_oa.c b/drivers/gpu/drm/xe/xe_oa.c
> index ae94490b0eac8..6b440a5470255 100644
> --- a/drivers/gpu/drm/xe/xe_oa.c
> +++ b/drivers/gpu/drm/xe/xe_oa.c
> @@ -691,7 +691,6 @@ static void xe_oa_store_flex(struct xe_oa_stream *stream, struct xe_lrc *lrc,
>
>	do {
>		bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_GGTT |
> -				    MI_FORCE_WRITE_COMPLETION_CHECK |
>				    MI_SDI_NUM_DW(1);
>		bb->cs[bb->len++] = offset + flex->offset * sizeof(u32);
>		bb->cs[bb->len++] = 0;
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index 3a75a08b6be92..0be4f489d3e12 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -72,8 +72,7 @@ static int emit_user_interrupt(u32 *dw, int i)
>
>  static int emit_store_imm_ggtt(u32 addr, u32 value, u32 *dw, int i)
>  {
> -	dw[i++] = MI_STORE_DATA_IMM | MI_SDI_GGTT |
> -		  MI_FORCE_WRITE_COMPLETION_CHECK | MI_SDI_NUM_DW(1);
> +	dw[i++] = MI_STORE_DATA_IMM | MI_SDI_GGTT | MI_SDI_NUM_DW(1);
>	dw[i++] = addr;
>	dw[i++] = 0;
>	dw[i++] = value;
> @@ -163,8 +162,7 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
>  static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
>				       u32 *dw, int i)
>  {
> -	dw[i++] = MI_STORE_DATA_IMM | MI_FORCE_WRITE_COMPLETION_CHECK |
> -		  MI_SDI_NUM_QW(1);
> +	dw[i++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(1);
>	dw[i++] = lower_32_bits(addr);
>	dw[i++] = upper_32_bits(addr);
>	dw[i++] = lower_32_bits(value);
> --
> 2.47.1
>


More information about the Intel-xe mailing list