[Intel-xe] [PATCH 3/3] drm/xe: Fix LRC workarounds

Matt Roper matthew.d.roper at intel.com
Tue Sep 5 19:13:34 UTC 2023


On Tue, Sep 05, 2023 at 11:09:42AM -0700, Lucas De Marchi wrote:
> On Tue, Sep 05, 2023 at 11:01:51AM -0700, Matt Roper wrote:
> > On Tue, Sep 05, 2023 at 07:31:44AM -0700, Lucas De Marchi wrote:
> > > Fix 2 issues when writing LRC workarounds by copying the same handling
> > > done when processing other RTP entries:
> > > 
> > > For masked registers, it was not correctly setting the upper 16bits.
> > > Differently than i915, the entry itself doesn't set the upper bits
> > > for masked registers: this is done when applying them. Testing on ADL-P:
> > > 
> > > Before:
> > > 	[drm:xe_gt_record_default_lrcs [xe]] LRC WA rcs0 save-restore MMIOs
> > > 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x2580] = 0x00000002
> > > 	...
> > > 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7018] = 0x00002000
> > > 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7300] = 0x00000040
> > > 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7304] = 0x00000200
> > > 
> > > After:
> > > 	[drm:xe_gt_record_default_lrcs [xe]] LRC WA rcs0 save-restore MMIOs
> > > 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x2580] = 0x00060002
> > > 	...
> > > 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7018] = 0x20002000
> > > 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7300] = 0x00400040
> > > 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7304] = 0x02000200
> > > 
> > > All of these registers are masked registers, so writing to them without
> > > the relevant bits in the upper 16b doesn't have any effect.
> > > 
> > > Also, this adds support to regular registers; previously it was assumed
> > > that LRC entries would only contain masked registers. However this is
> > > not true. 0x6604 is not a masked register, but used in workarounds for
> > > e.g.  ADL-P. See commit 28cf243a341a ("drm/i915/gt: Fix context
> > > workarounds with non-masked regs"). In the same test with ADL-P as
> > > above:
> > > 
> > > Before:
> > > 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x6604] = 0xe0000000
> > > After:
> > > 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x6604] = 0xe0efef6f
> > > 
> > > As can be seen, now it will read what was in the register rather than
> > > completely overwrite the other bits.
> > > 
> > > Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > > ---
> > >  drivers/gpu/drm/xe/xe_gt.c | 42 +++++++++++++++++++++++++++++++-------
> > >  1 file changed, 35 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > > index bd307770a620..6dd3cb5d8246 100644
> > > --- a/drivers/gpu/drm/xe/xe_gt.c
> > > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > > @@ -114,12 +114,21 @@ static int emit_nop_job(struct xe_gt *gt, struct xe_exec_queue *q)
> > >  	return 0;
> > >  }
> > > 
> > > +/*
> > > + * Convert back from encoded value to type-safe, only to be used when reg.mcr
> > > + * is true
> > > + */
> > > +static struct xe_reg_mcr to_xe_reg_mcr(const struct xe_reg reg)
> > > +{
> > > +	return (const struct xe_reg_mcr){.__reg.raw = reg.raw };
> > > +}
> > > +
> > >  static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
> > >  {
> > >  	struct xe_device *xe = gt_to_xe(gt);
> > >  	struct xe_reg_sr *sr = &q->hwe->reg_lrc;
> > >  	struct xe_reg_sr_entry *entry;
> > > -	unsigned long reg;
> > > +	unsigned long idx;
> > >  	struct xe_sched_job *job;
> > >  	struct xe_bb *bb;
> > >  	struct dma_fence *fence;
> > > @@ -130,18 +139,37 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
> > >  	if (IS_ERR(bb))
> > >  		return PTR_ERR(bb);
> > > 
> > > -	xa_for_each(&sr->xa, reg, entry)
> > > +	xa_for_each(&sr->xa, idx, entry)
> > >  		++count;
> > > 
> > >  	if (count) {
> > >  		drm_dbg(&xe->drm, "LRC WA %s save-restore MMIOs\n", sr->name);
> > > 
> > >  		bb->cs[bb->len++] = MI_LOAD_REGISTER_IMM(count);
> > > -		xa_for_each(&sr->xa, reg, entry) {
> > > -			bb->cs[bb->len++] = reg;
> > > -			bb->cs[bb->len++] = entry->set_bits;
> > > -			drm_dbg(&xe->drm, "REG[0x%lx] = 0x%08x", reg,
> > > -				entry->set_bits);
> > > +
> > > +		xa_for_each(&sr->xa, idx, entry) {
> > > +			struct xe_reg reg = entry->reg;
> > > +			struct xe_reg_mcr reg_mcr = to_xe_reg_mcr(reg);
> > > +			u32 val;
> > > +
> > > +			bb->cs[bb->len++] = reg.addr;
> > > +
> > > +			/*
> > > +			 * Skip reading the register if it's not really needed
> > > +			 */
> > > +			if (reg.masked)
> > > +				val = entry->clr_bits << 16;
> > > +			else if (entry->clr_bits + 1)
> > > +				val = (reg.mcr ?
> > > +				       xe_gt_mcr_unicast_read_any(gt, reg_mcr) :
> > > +				       xe_mmio_read32(gt, reg)) & (~entry->clr_bits);
> > 
> > I'm a little bit worried about whether this will continue to work in the
> > long term.  There are definitely some registers out there that we can't
> > read properly from the CPU.  Maybe none of our current LRC workarounds
> > target such registers, but they do exist (I have a vague memory of
> > hitting this on some old pre-production PVC LRC workarounds I think).
> 
> but were them also non-masked?

Yeah, it was a non-masked register.  I think something like BLIT_CCTL
maybe?  Anyway, that was an early pre-production only issue, so not
relevant anymore.  I'm just mentioning it in case we even run across
something similar again in the future.


Matt

> 
> > We can probably go with this CPU read for now, but maybe we should leave
> > a comment here that we might need to switch to a GPU-based approach at
> > some point in the future if we run into CPU inaccessible registers.
> > 
> > If we wind up needing to change the approach down the road, I think
> > there are probably a handful of possible approaches:
> > 
> > * Save the default_lrc without applying the LRC workarounds initially.
> >   Read the in-memory contents of the LRC to find the original register
> >   value.  Then submit the LRC batch and re-save the LRC with the
> >   workarounds applied.
> > * Submit a preliminary batchbuffer with MI_SRM instructions to save the
> >   hw-default register values to a hwsp location that we can then pick
> >   up and use when crafting the MI_LRI batch.  Or update in memory and
> >   use MI_LRM instead.
> > * Use MI_MATH to perform the desired bitwise operations on register
> >   bits directly.
> 
> yeah, we've had the same concern when doing similar fix in i915. From
> all of the options, I think the one doing MI_MATH for the very few
> registers in the category [ non-masked, can't-be-read-from-cpu ] would
> be appropriate.  I  can take a look on doing that later, but I think
> it's more urgent to fix the outstanding issue we have right now of
> basically not really applying most of the LRC WAs.
> 
> > 
> > Anyway, we can deal with that later if/when it becomes necessary.  For
> > now,
> > 
> > Reviewed-by: Matt Roper <matthew.d.roper at intel.com>
> 
> thanks
> 
> Lucas De Marchi
> 
> 
> > 
> > 
> > Matt
> > 
> > > +			else
> > > +				val = 0;
> > > +
> > > +			val |= entry->set_bits;
> > > +			bb->cs[bb->len++] = val;
> > > +
> > > +			drm_dbg(&xe->drm, "REG[0x%x] = 0x%08x", reg.addr, val);
> > >  		}
> > >  	}
> > > 
> > > --
> > > 2.40.1
> > > 
> > 
> > -- 
> > Matt Roper
> > Graphics Software Engineer
> > Linux GPU Platform Enablement
> > Intel Corporation

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-xe mailing list