[PATCH] drm/xe/xe2lpm: Add Wa_15015404425

Tue Jul 2 05:34:09 UTC 2024

> -----Original Message-----
> From: Roper, Matthew D <matthew.d.roper at intel.com>
> Sent: Thursday, May 30, 2024 1:33 AM
> To: Upadhyay, Tejas <tejas.upadhyay at intel.com>
> Cc: intel-xe at lists.freedesktop.org
> Subject: Re: [PATCH] drm/xe/xe2lpm: Add Wa_15015404425
> 
> On Tue, May 28, 2024 at 11:39:09PM -0700, Upadhyay, Tejas wrote:
> >
> >
> > > -----Original Message-----
> > > From: Roper, Matthew D <matthew.d.roper at intel.com>
> > > Sent: Wednesday, May 29, 2024 4:34 AM
> > > To: Upadhyay, Tejas <tejas.upadhyay at intel.com>
> > > Cc: intel-xe at lists.freedesktop.org
> > > Subject: Re: [PATCH] drm/xe/xe2lpm: Add Wa_15015404425
> > >
> > > On Fri, May 24, 2024 at 12:02:28PM +0530, Tejas Upadhyay wrote:
> > > > Wa_15015404425 applies to xe2_lpm all steppings
> > > >
> > > > Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> > > > ---
> > > >  drivers/gpu/drm/xe/regs/xe_regs.h |  1 +
> > > >  drivers/gpu/drm/xe/xe_mmio.c      | 18 ++++++++++++++++++
> > > >  2 files changed, 19 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/xe/regs/xe_regs.h
> > > > b/drivers/gpu/drm/xe/regs/xe_regs.h
> > > > index 722fb6dbb72e..07c8a23bd7c2 100644
> > > > --- a/drivers/gpu/drm/xe/regs/xe_regs.h
> > > > +++ b/drivers/gpu/drm/xe/regs/xe_regs.h
> > > > @@ -42,6 +42,7 @@
> > > >  #define STOLEN_RESERVED
> 	XE_REG(0x1082c0)
> > > >  #define   WOPCM_SIZE_MASK			REG_GENMASK64(9,
> > > 7)
> > > >
> > > > +#define MEDIA_DUMMY_REG
> > > 	XE_REG(0x138000)
> > >
> > > The workaround description recommended using 0x130030 as the target
> > > offset.  Is there a reason you're using 0x138000 (which is a real
> > > RP_STATE_CAP register that you'll probably be clobbering if you write to
> it)?
> >
> > Yes, I missed it, while solving checkpatch erros. I will add correct address
> back.
> >
> > >
> > > >  #define MTL_RP_STATE_CAP			XE_REG(0x138000)
> > > >
> > > >  #define MTL_GT_RPE_FREQUENCY
> 	XE_REG(0x13800c)
> > > > diff --git a/drivers/gpu/drm/xe/xe_mmio.c
> > > > b/drivers/gpu/drm/xe/xe_mmio.c index ff7a7cf99530..e1dc87078387
> > > 100644
> > > > --- a/drivers/gpu/drm/xe/xe_mmio.c
> > > > +++ b/drivers/gpu/drm/xe/xe_mmio.c
> > > > @@ -450,6 +450,21 @@ int xe_mmio_init(struct xe_device *xe)
> > > >  	return devm_add_action_or_reset(xe->drm.dev, mmio_fini, xe);  }
> > > >
> > > > +static void mmio_flush_pending_writes(struct xe_gt *gt) {
> > > > +	struct xe_device *xe = gt_to_xe(gt);
> > > > +
> > > > +	if (!(xe->info.platform == XE_LUNARLAKE))
> > > > +		return;
> > >
> > > This workaround is a bit strange; it's tagged as only applying to
> > > Xe2_LPM, but the description indicates that it actually lies within
> > > the SoC fabric.  Our usual graphics/media workaround database
> > > doesn't always capture SoC workarounds very well...have you
> > > confirmed that this applies to all SoC steppings (which is different from the
> graphics/media steppings)?
> > >
> > > > +
> > > > +	if (xe_gt_is_media_type(gt) &&  MEDIA_VER(xe) == 20) {
> > >
> > > The description also notes that we need to apply this workaround
> > > before issuing a "read to Gfx GTTMMADR BAR."  So even though this is
> > > tagged as Xe2_LPM (probably because they didn't have a good way to
> > > express SoC details in our usual workaround database), it sounds
> > > like this workaround is probably needed for any reads through the
> > > PCI BAR, not just the ones targetting media offsets.  In fact from
> > > the description it sounds like it would also probably apply to any
> > > reads of the GSM through the GTTMMADR BAR as well, not just the
> registers.
> >
> > Details suggests "These dummy writes are not needed for PCI CFG space
> > access (those are always NP). ". Also by the way we apply it in all
> 
> PCI CFG space is the kind of stuff you see when you run lspci.  Anything that's
> inside a BAR (i.e., either registers or GGTT entries if we're talking about BAR0)
> isn't considered part of the config space.
> 
> > reads which are 32bit, its just that we check here media version 2000
> > is present on SoC. As per my understanding we don't need this WA if
> > media 2000 is not present. Should we correct this understanding?
> 
> I think we should probably confirm directly with the hardware team.  The
> workaround database that we look at for 99% of the stuff in our driver is
> really specific to graphics/media issues.  In cases where there are issues
> outside the graphics/media itself (e.g., in the SoC) that happen to impact the
> operation of graphics/media, the handling and tracking of those can be a bit
> inconsistent so it's best to double check offline as to exactly what the scope
> and limits of the workaround need to be.  It might turn out to be something
> like "any media stepping can be impacted, but only if the SoC stepping is in
> the range [X, Y)."  And since the text refers to BAR access (and doesn't seem
> to actually be specific to media), it seems like the fact that this is only marked
> as Xe2_LPM might be a mistake.  Definitely good to double check.

I got response from hardware team as follows :
" This WA applies to ANY CPU read to GFx MMIO (GTTMMADR BAR).  It applies not just on driver load, it applies throughout driver execution - e.g. ANY TIME code running on the CPU (e.g. driver) wants to do a CPU initiated MMIO read of registers in the gfx MMIO BAR toward ANY offset (Gunit, Display, Media, GT, Punit mailbox...), it should first do 4 dummy writes to ensure that any prior real write issued by the CPU is properly ordered/flushed out of the C2C unit and thus ordered ahead of the real read that driver is about to issue.   Technically it would only be required to do the dummy writes if there was a possibility of a prior write that had not already been fenced before... but implementing that optimization is probably more trouble than it is worth."

So I think I would need to remove media check, just a platform check should suffice to apply WA in all reads. Any comments?

> 
> >
> > >
> > > > +		int itr = 4;
> > > > +		/* 4 dummy writes */
> > > > +		while (itr--)
> > > > +			xe_mmio_write32(gt, MEDIA_DUMMY_REG, 0);
> > > > +	}
> > > > +}
> > > > +
> > > >  u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg)  {
> > > >  	struct xe_tile *tile = gt_to_tile(gt); @@ -479,6 +494,9 @@ u32
> > > > xe_mmio_read32(struct xe_gt *gt, struct xe_reg reg)
> > > >  	struct xe_tile *tile = gt_to_tile(gt);
> > > >  	u32 addr = xe_mmio_adjusted_addr(gt, reg.addr);
> > > >
> > > > +	/* Wa_15015404425 */
> > > > +	mmio_flush_pending_writes(gt);
> > >
> > > There aren't a whole lot of them, but technically 8-bit and 16-bit
> > > reads probably need this workaround as well.
> >
> > Ok, I will double check this. I noted this, but I thought it will not be right
> thing to write32 inside read8 and read16, isn't it?
> 
> It sounds like the goal of this workaround is to flush through any previous PCI
> writes by performing additional dummy writes before a read happens.  In
> that case, I don't think the size of the read 8/16/32 really matters.
> 
> Although I do wonder if we'll need to perform some synchronization to
> handle cases where multiple threads are performing MMIO access at the
> same time.  i915 had an 'uncore lock' that prevented concurrent access, but
> Xe doesn't have anything like that, so is it going to cause a problem if we have
> a situation like this after implementing the workaround?
> 
>         Thread 1        Thread 2
>         ========        ========
>         Dummy Write 1
>         Dummy Write 2
>         Dummy Write 3
>                         Real Write
>         Dummy Write 4
>         Real Read
> 
> I.e., do we need to make sure the sequence of dummy writes and its
> following read are uninterrupted by other threads?

Discussing with HW team, if we need such sync. Will keep updated.

Thanks,
Tejas 
> 
> 
> Matt
> 
> >
> > Thanks,
> > Tejas
> > >
> > >
> > > Matt
> > >
> > > > +
> > > >  	return readl((reg.ext ? tile->mmio_ext.regs : tile->mmio.regs) +
> > > > addr);  }
> > > >
> > > > --
> > > > 2.25.1
> > > >
> > >
> > > --
> > > Matt Roper
> > > Graphics Software Engineer
> > > Linux GPU Platform Enablement
> > > Intel Corporation
> 
> --
> Matt Roper
> Graphics Software Engineer
> Linux GPU Platform Enablement
> Intel Corporation