[PATCH] drm/xe: Remove 'Force write completion' from MI_STORE_DATA_IMM

Thomas Hellström thomas.hellstrom at linux.intel.com
Thu Dec 26 11:09:39 UTC 2024


Hi!

On Mon, 2024-12-23 at 19:08 -0800, Dixit, Ashutosh wrote:
> On Mon, 23 Dec 2024 10:29:18 -0800, José Roberto de Souza wrote:
> > 
> > In all places the MI_STORE_DATA_IMM are not followed by a read of
> > the same memory address in the same batch buffer and the posted
> > writes
> > are flushed with PIPE_CONTROL or MI_FLUSH_DW in xe_ring_ops.c
> > functions
> > so there is no need to set this register.
> > 
> > This is not a revert because I think it is worthy keep the register
> > definition and the comment on top of it so future readers don't
> > get the same wrong conclusion as I did.
> > 
> > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > Cc: Ashutosh Dixit <ashutosh.dixit at intel.com>
> > Fixes: 1460bb1fef9c ("drm/xe: Force write completion of
> > MI_STORE_DATA_IMM")
> > Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> > ---
> >  drivers/gpu/drm/xe/instructions/xe_mi_commands.h | 4 ++++
> >  drivers/gpu/drm/xe/xe_migrate.c                  | 9 ++-------
> >  drivers/gpu/drm/xe/xe_oa.c                       | 1 -
> >  drivers/gpu/drm/xe/xe_ring_ops.c                 | 6 ++----
> >  4 files changed, 8 insertions(+), 12 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> > b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> > index f4ee910f09432..8e9884aaa4b8a 100644
> > --- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> > +++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> > @@ -35,6 +35,10 @@
> > 
> >  #define MI_STORE_DATA_IMM			__MI_INSTR(0x20)
> >  #define   MI_SDI_GGTT				REG_BIT(22)
> > +/*
> > + * PIPE_CONTROL and MI_FLUSH_DW flushes all posted writes, only
> > needed if
> > + * written value is read in batch buffer
> 
> I don't think this is correct. If a previous posted write writes data
> and
> that data is lying in GPU cache, this or a future batch buffer should
> be
> able to read that data, even without this flag.
> 
> The purpose of MI_FORCE_WRITE_COMPLETION_CHECK flag seems to be a
> sort of
> SFENCE for the GPU, and SFENCE equivalent can also be done using
> MI_FLUSH_DW and PIPE_CONTROL. It is the latter which seem to be used
> in the
> driver now, rather than this flag. So the purpose is to make the
> writes
> globally visible.
> 
> So if you want to leave a comment here you'd have to say something
> like
> that and drop "only needed if written value is read in batch
> buffer". So something like:
> 
> /*
>  * At present PIPE_CONTROL and MI_FLUSH_DW flush posted writes (when
> needed)
>  * in the Xe driver. This flag provides another means of doing the
> same.
>  */
> 
> Otherwise just revert the patch. Thomas can review the wording.
> 
> Also the OA instance has or will be been soon deleted so is no longer
> relevant.
> 
> Sorry, I should have been more careful while reviewing.
> 
> Thanks.
> --
> Ashutosh

If you have not already applied this patch, just please revert the
original one with a short explanation in the commit message.

I think that will stop unnecessary stable backporting and I can skip
both patches for the -fixes series. Also typically we avoid having code
in KMD for future use.

Moving forward I think this flag might come in handy for simplifying
some ring-ops. In particular the ones where we first emit page-table
updates and then the payload.

Thanks,
Thomas




> 
> > + */
> >  #define   MI_FORCE_WRITE_COMPLETION_CHECK	REG_BIT(10)
> >  #define   MI_SDI_LEN_DW				GENMASK(9,
> > 0)
> >  #define  
> > MI_SDI_NUM_DW(x)			REG_FIELD_PREP(MI_SDI_LEN_DW, (x) + 3 - 2)
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > b/drivers/gpu/drm/xe/xe_migrate.c
> > index 8b32fad678782..9b3bcb789eee3 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -581,9 +581,7 @@ static void emit_pte(struct xe_migrate *m,
> > 	while (ptes) {
> > 		u32 chunk = min(MAX_PTE_PER_SDI, ptes);
> > 
> > -		bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > -				   
> > MI_FORCE_WRITE_COMPLETION_CHECK |
> > -				    MI_SDI_NUM_QW(chunk);
> > +		bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > MI_SDI_NUM_QW(chunk);
> > 		bb->cs[bb->len++] = ofs;
> > 		bb->cs[bb->len++] = 0;
> > 
> > @@ -1225,9 +1223,7 @@ static void write_pgtable(struct xe_tile
> > *tile, struct xe_bb *bb, u64 ppgtt_ofs,
> > 		if (!(bb->len & 1))
> > 			bb->cs[bb->len++] = MI_NOOP;
> > 
> > -		bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > -				   
> > MI_FORCE_WRITE_COMPLETION_CHECK |
> > -				    MI_SDI_NUM_QW(chunk);
> > +		bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > MI_SDI_NUM_QW(chunk);
> > 		bb->cs[bb->len++] = lower_32_bits(addr);
> > 		bb->cs[bb->len++] = upper_32_bits(addr);
> > 		if (pt_op->bind)
> > @@ -1392,7 +1388,6 @@ __xe_migrate_update_pgtables(struct
> > xe_migrate *m,
> > 			u32 idx = 0;
> > 
> > 			bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > -					   
> > MI_FORCE_WRITE_COMPLETION_CHECK |
> > 					    MI_SDI_NUM_QW(chunk);
> > 			bb->cs[bb->len++] = ofs;
> > 			bb->cs[bb->len++] = 0; /* upper_32_bits */
> > diff --git a/drivers/gpu/drm/xe/xe_oa.c
> > b/drivers/gpu/drm/xe/xe_oa.c
> > index ae94490b0eac8..6b440a5470255 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.c
> > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > @@ -691,7 +691,6 @@ static void xe_oa_store_flex(struct
> > xe_oa_stream *stream, struct xe_lrc *lrc,
> > 
> > 	do {
> > 		bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > MI_SDI_GGTT |
> > -				   
> > MI_FORCE_WRITE_COMPLETION_CHECK |
> > 				    MI_SDI_NUM_DW(1);
> > 		bb->cs[bb->len++] = offset + flex->offset *
> > sizeof(u32);
> > 		bb->cs[bb->len++] = 0;
> > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
> > b/drivers/gpu/drm/xe/xe_ring_ops.c
> > index 3a75a08b6be92..0be4f489d3e12 100644
> > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > @@ -72,8 +72,7 @@ static int emit_user_interrupt(u32 *dw, int i)
> > 
> >  static int emit_store_imm_ggtt(u32 addr, u32 value, u32 *dw, int
> > i)
> >  {
> > -	dw[i++] = MI_STORE_DATA_IMM | MI_SDI_GGTT |
> > -		  MI_FORCE_WRITE_COMPLETION_CHECK |
> > MI_SDI_NUM_DW(1);
> > +	dw[i++] = MI_STORE_DATA_IMM | MI_SDI_GGTT |
> > MI_SDI_NUM_DW(1);
> > 	dw[i++] = addr;
> > 	dw[i++] = 0;
> > 	dw[i++] = value;
> > @@ -163,8 +162,7 @@ static int emit_pipe_invalidate(u32 mask_flags,
> > bool invalidate_tlb, u32 *dw,
> >  static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
> > 				       u32 *dw, int i)
> >  {
> > -	dw[i++] = MI_STORE_DATA_IMM |
> > MI_FORCE_WRITE_COMPLETION_CHECK |
> > -		  MI_SDI_NUM_QW(1);
> > +	dw[i++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(1);
> > 	dw[i++] = lower_32_bits(addr);
> > 	dw[i++] = upper_32_bits(addr);
> > 	dw[i++] = lower_32_bits(value);
> > --
> > 2.47.1
> > 



More information about the Intel-xe mailing list