[PATCH] drm/xe: Remove 'Force write completion' from MI_STORE_DATA_IMM
Thomas Hellström
thomas.hellstrom at linux.intel.com
Thu Dec 26 11:09:39 UTC 2024
Hi!
On Mon, 2024-12-23 at 19:08 -0800, Dixit, Ashutosh wrote:
> On Mon, 23 Dec 2024 10:29:18 -0800, José Roberto de Souza wrote:
> >
> > In all places the MI_STORE_DATA_IMM are not followed by a read of
> > the same memory address in the same batch buffer and the posted
> > writes
> > are flushed with PIPE_CONTROL or MI_FLUSH_DW in xe_ring_ops.c
> > functions
> > so there is no need to set this register.
> >
> > This is not a revert because I think it is worthy keep the register
> > definition and the comment on top of it so future readers don't
> > get the same wrong conclusion as I did.
> >
> > Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> > Cc: Ashutosh Dixit <ashutosh.dixit at intel.com>
> > Fixes: 1460bb1fef9c ("drm/xe: Force write completion of
> > MI_STORE_DATA_IMM")
> > Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> > ---
> > drivers/gpu/drm/xe/instructions/xe_mi_commands.h | 4 ++++
> > drivers/gpu/drm/xe/xe_migrate.c | 9 ++-------
> > drivers/gpu/drm/xe/xe_oa.c | 1 -
> > drivers/gpu/drm/xe/xe_ring_ops.c | 6 ++----
> > 4 files changed, 8 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> > b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> > index f4ee910f09432..8e9884aaa4b8a 100644
> > --- a/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> > +++ b/drivers/gpu/drm/xe/instructions/xe_mi_commands.h
> > @@ -35,6 +35,10 @@
> >
> > #define MI_STORE_DATA_IMM __MI_INSTR(0x20)
> > #define MI_SDI_GGTT REG_BIT(22)
> > +/*
> > + * PIPE_CONTROL and MI_FLUSH_DW flushes all posted writes, only
> > needed if
> > + * written value is read in batch buffer
>
> I don't think this is correct. If a previous posted write writes data
> and
> that data is lying in GPU cache, this or a future batch buffer should
> be
> able to read that data, even without this flag.
>
> The purpose of MI_FORCE_WRITE_COMPLETION_CHECK flag seems to be a
> sort of
> SFENCE for the GPU, and SFENCE equivalent can also be done using
> MI_FLUSH_DW and PIPE_CONTROL. It is the latter which seem to be used
> in the
> driver now, rather than this flag. So the purpose is to make the
> writes
> globally visible.
>
> So if you want to leave a comment here you'd have to say something
> like
> that and drop "only needed if written value is read in batch
> buffer". So something like:
>
> /*
> * At present PIPE_CONTROL and MI_FLUSH_DW flush posted writes (when
> needed)
> * in the Xe driver. This flag provides another means of doing the
> same.
> */
>
> Otherwise just revert the patch. Thomas can review the wording.
>
> Also the OA instance has or will be been soon deleted so is no longer
> relevant.
>
> Sorry, I should have been more careful while reviewing.
>
> Thanks.
> --
> Ashutosh
If you have not already applied this patch, just please revert the
original one with a short explanation in the commit message.
I think that will stop unnecessary stable backporting and I can skip
both patches for the -fixes series. Also typically we avoid having code
in KMD for future use.
Moving forward I think this flag might come in handy for simplifying
some ring-ops. In particular the ones where we first emit page-table
updates and then the payload.
Thanks,
Thomas
>
> > + */
> > #define MI_FORCE_WRITE_COMPLETION_CHECK REG_BIT(10)
> > #define MI_SDI_LEN_DW GENMASK(9,
> > 0)
> > #define
> > MI_SDI_NUM_DW(x) REG_FIELD_PREP(MI_SDI_LEN_DW, (x) + 3 - 2)
> > diff --git a/drivers/gpu/drm/xe/xe_migrate.c
> > b/drivers/gpu/drm/xe/xe_migrate.c
> > index 8b32fad678782..9b3bcb789eee3 100644
> > --- a/drivers/gpu/drm/xe/xe_migrate.c
> > +++ b/drivers/gpu/drm/xe/xe_migrate.c
> > @@ -581,9 +581,7 @@ static void emit_pte(struct xe_migrate *m,
> > while (ptes) {
> > u32 chunk = min(MAX_PTE_PER_SDI, ptes);
> >
> > - bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > -
> > MI_FORCE_WRITE_COMPLETION_CHECK |
> > - MI_SDI_NUM_QW(chunk);
> > + bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > MI_SDI_NUM_QW(chunk);
> > bb->cs[bb->len++] = ofs;
> > bb->cs[bb->len++] = 0;
> >
> > @@ -1225,9 +1223,7 @@ static void write_pgtable(struct xe_tile
> > *tile, struct xe_bb *bb, u64 ppgtt_ofs,
> > if (!(bb->len & 1))
> > bb->cs[bb->len++] = MI_NOOP;
> >
> > - bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > -
> > MI_FORCE_WRITE_COMPLETION_CHECK |
> > - MI_SDI_NUM_QW(chunk);
> > + bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > MI_SDI_NUM_QW(chunk);
> > bb->cs[bb->len++] = lower_32_bits(addr);
> > bb->cs[bb->len++] = upper_32_bits(addr);
> > if (pt_op->bind)
> > @@ -1392,7 +1388,6 @@ __xe_migrate_update_pgtables(struct
> > xe_migrate *m,
> > u32 idx = 0;
> >
> > bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > -
> > MI_FORCE_WRITE_COMPLETION_CHECK |
> > MI_SDI_NUM_QW(chunk);
> > bb->cs[bb->len++] = ofs;
> > bb->cs[bb->len++] = 0; /* upper_32_bits */
> > diff --git a/drivers/gpu/drm/xe/xe_oa.c
> > b/drivers/gpu/drm/xe/xe_oa.c
> > index ae94490b0eac8..6b440a5470255 100644
> > --- a/drivers/gpu/drm/xe/xe_oa.c
> > +++ b/drivers/gpu/drm/xe/xe_oa.c
> > @@ -691,7 +691,6 @@ static void xe_oa_store_flex(struct
> > xe_oa_stream *stream, struct xe_lrc *lrc,
> >
> > do {
> > bb->cs[bb->len++] = MI_STORE_DATA_IMM |
> > MI_SDI_GGTT |
> > -
> > MI_FORCE_WRITE_COMPLETION_CHECK |
> > MI_SDI_NUM_DW(1);
> > bb->cs[bb->len++] = offset + flex->offset *
> > sizeof(u32);
> > bb->cs[bb->len++] = 0;
> > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
> > b/drivers/gpu/drm/xe/xe_ring_ops.c
> > index 3a75a08b6be92..0be4f489d3e12 100644
> > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > @@ -72,8 +72,7 @@ static int emit_user_interrupt(u32 *dw, int i)
> >
> > static int emit_store_imm_ggtt(u32 addr, u32 value, u32 *dw, int
> > i)
> > {
> > - dw[i++] = MI_STORE_DATA_IMM | MI_SDI_GGTT |
> > - MI_FORCE_WRITE_COMPLETION_CHECK |
> > MI_SDI_NUM_DW(1);
> > + dw[i++] = MI_STORE_DATA_IMM | MI_SDI_GGTT |
> > MI_SDI_NUM_DW(1);
> > dw[i++] = addr;
> > dw[i++] = 0;
> > dw[i++] = value;
> > @@ -163,8 +162,7 @@ static int emit_pipe_invalidate(u32 mask_flags,
> > bool invalidate_tlb, u32 *dw,
> > static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
> > u32 *dw, int i)
> > {
> > - dw[i++] = MI_STORE_DATA_IMM |
> > MI_FORCE_WRITE_COMPLETION_CHECK |
> > - MI_SDI_NUM_QW(1);
> > + dw[i++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(1);
> > dw[i++] = lower_32_bits(addr);
> > dw[i++] = upper_32_bits(addr);
> > dw[i++] = lower_32_bits(value);
> > --
> > 2.47.1
> >
More information about the Intel-xe
mailing list