[Intel-xe] [PATCH 2/3] drm/xe: Add missing TLB invalidation to emit_pipe_invalidate()

Matt Roper matthew.d.roper at intel.com
Thu Jun 1 19:26:32 UTC 2023


On Wed, May 31, 2023 at 10:46:20AM +0200, Thomas Hellström wrote:
> 
> On 5/30/23 20:40, Souza, Jose wrote:
> > On Mon, 2023-05-29 at 17:07 +0200, Thomas Hellström wrote:
> > > On 5/29/23 16:56, Thomas Hellström wrote:
> > > > On 5/29/23 16:48, Souza, Jose wrote:
> > > > > On Mon, 2023-05-29 at 11:08 +0200, Thomas Hellström wrote:
> > > > > > On Fri, 2023-05-26 at 12:06 -0700, José Roberto de Souza wrote:
> > > > > > > i915 invalidates TLB before emit BB start, so doing the same in Xe.
> > > > > > Hi, José,
> > > > > > 
> > > > > > Do you see an issue because of missing TLB flushes? In that case that
> > > > > > needs to be added to the commit message. We do TLB flushes on unbind,
> > > > > > but not sure if we do them on rebind, so if that's the issue we need to
> > > > > > figure out whether we should do them also on rebind or, like in this
> > > > > > patch, on each exec.
> > > > > I have a group of tests that results flips randomly. It fails when
> > > > > the rendered buffer is compared to expected result.
> > > > > Anything that add a bit of delay after the exec fixes those tests so
> > > > > I was looking for any missing flush in Xe KMD and Mesa.
> > > > > 
> > > > > This one did not fixed it but as i915 was doing it I thought would be
> > > > > good to do in Xe too.
> > > > I think missing TLB invalidations are more likely to cause random
> > > > overwrites of freed memory. Let me do a quick check on these. But the
> > > > problem you're describing indeed sounds more like a missing render
> > > > cache flush.
> > > > 
> > > It indeed looks like we're doing proper TLB invalidation on both rebind
> > > and unbind, so this patch shouldn't really be needed.
> > > 
> > > (look for "invalidation_fence_init()")
> > With this patch + PIPE_CONTROL flush at the end of batch buffer in Mesa, fixed the groups of the tests that were flipping results.
> > Do a XE_GUC_ACTION_TLB_INVALIDATION is the same as a PIPE_CONTROL_TLB_INVALIDATE?

+Cc Matt Brost and Niranjana

It's worth noting that the removal of PIPE_CONTROL_TLB_INVALIDATE was a
deliberate act in this patch:

        commit 41db3304cff95475c6f6667ae27fab3c144a49df
        Author:     Matthew Brost <matthew.brost at intel.com>
        AuthorDate: Thu Jan 26 10:40:41 2023 -0800
        Commit:     Rodrigo Vivi <rodrigo.vivi at intel.com>
        CommitDate: Tue May 23 14:13:47 2023 -0400

            drm/xe: Drop TLB invalidation from ring operations

so the expectation is that this specific flag shouldn't be necessary due
to the GuC-based invalidation we're doing.

At the hardware level, there are a lot of different TLBs that cache page
table lookups.  Each engine has a TLB, the GuC itself has another TLB,
the OA unit has another one, and on igpu platforms, there are also
additional AuxCCS TLBs for the compression lookups that most engines can
do.  You can trigger TLB invalidations in various ways:  via GPU
instructions (like PIPE_CONTROL), MMIO registers (like
XEHP_GFX_TLB_INV_CR), or via a GuC request
(XE_GUC_ACTION_TLB_INVALIDATION).  The details of invalidation vary by
platform (e.g., you can do selective range-based invalidation on some
platforms, but not on others).  The GuC just provides a single
interface, but that interface lets you choose whether you're doing full
vs range invalidation, whether you're invalidating engine TLBs or
the GuC's own TLB, etc.  If using the GuC interface to invalidate engine
TLBs, it just requires a single request and takes care of invalidating
all of the individual engines on its end.

Are the mesa issues being seen on all platforms or just specific ones?


Matt

> 
> I would think so, yes, except that the GUC TLB invalidation is GT-wide and
> I'm not sure whether PIPE_CONTROL_TLB_INVALIDATE is per hw engine or per-GT.
> But given this really has an impact, it might be that we need to invalidate
> TLB also after a bind where we previously pointed to the scratch page.
> 
> If that's indeed the culprit we should look at issuing a
> PIPE_CONTROL_TLB_INVALIDATE on the exec following a bind or rebind, and
> leave the GuC TLB invalidations for unbinds, and then this patch makes sense
> as a start.
> 
> 
> > 
> > Do you see any PIPE_CONTROL flush at the end of batch buffers that i915 does but Xe don't?
> 
> The emit_fini_breadcrumb() called from __i915_request_submit() indeed seems
> to emit the flushes needed, whereas the corresponding
> emit_pipe_imm_ggtt() in xe doesn't.
> 
> /Thomas
> 
> 
> > 
> > > /Thomas
> > > 
> > > > /Thomas
> > > > 
> > > > > > Thanks,
> > > > > > Thomas
> > > > > > 
> > > > > > 
> > > > > > > Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> > > > > > > ---
> > > > > > >    drivers/gpu/drm/xe/regs/xe_gpu_commands.h | 1 +
> > > > > > >    drivers/gpu/drm/xe/xe_ring_ops.c          | 6 ++++--
> > > > > > >    2 files changed, 5 insertions(+), 2 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > > > > > > b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > > > > > > index 0f9c5b0b8a3ba..7c7320efea739 100644
> > > > > > > --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > > > > > > +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> > > > > > > @@ -73,6 +73,7 @@
> > > > > > >    #define
> > > > > > > PIPE_CONTROL_STORE_DATA_INDEX                        (1<<21)
> > > > > > >    #define
> > > > > > > PIPE_CONTROL_CS_STALL                                (1<<20)
> > > > > > >    #define PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET           (1<<19)
> > > > > > > +#define PIPE_CONTROL_TLB_INVALIDATE                  (1<<18)
> > > > > > >    #define
> > > > > > > PIPE_CONTROL_PSD_SYNC                                (1<<17)
> > > > > > >    #define
> > > > > > > PIPE_CONTROL_QW_WRITE                                (1<<14)
> > > > > > >    #define PIPE_CONTROL_DEPTH_STALL                     (1<<13)
> > > > > > > diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c
> > > > > > > b/drivers/gpu/drm/xe/xe_ring_ops.c
> > > > > > > index d2fa0b4c8bcc0..4f3ef39109b9b 100644
> > > > > > > --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> > > > > > > +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> > > > > > > @@ -37,7 +37,8 @@
> > > > > > >                   PIPE_CONTROL_INDIRECT_STATE_DISABLE | \
> > > > > > >                   PIPE_CONTROL_FLUSH_ENABLE | \
> > > > > > >                   PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE | \
> > > > > > > -               PIPE_CONTROL_DC_FLUSH_ENABLE)
> > > > > > > +               PIPE_CONTROL_DC_FLUSH_ENABLE | \
> > > > > > > +               PIPE_CONTROL_TLB_INVALIDATE)
> > > > > > >      static u32 preparser_disable(bool state)
> > > > > > >    {
> > > > > > > @@ -117,7 +118,8 @@ static int emit_pipe_invalidate(u32 mask_flags,
> > > > > > > u32 *dw, int i)
> > > > > > >                   PIPE_CONTROL_CONST_CACHE_INVALIDATE |
> > > > > > >                   PIPE_CONTROL_STATE_CACHE_INVALIDATE |
> > > > > > >                   PIPE_CONTROL_QW_WRITE |
> > > > > > > -               PIPE_CONTROL_STORE_DATA_INDEX;
> > > > > > > +               PIPE_CONTROL_STORE_DATA_INDEX |
> > > > > > > +               PIPE_CONTROL_TLB_INVALIDATE;
> > > > > > >             flags &= ~mask_flags;

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-xe mailing list