[Intel-xe] [PATCH 3/3] drm/xe: Fix LRC workarounds

Matt Roper matthew.d.roper at intel.com
Tue Sep 5 18:01:51 UTC 2023


On Tue, Sep 05, 2023 at 07:31:44AM -0700, Lucas De Marchi wrote:
> Fix 2 issues when writing LRC workarounds by copying the same handling
> done when processing other RTP entries:
> 
> For masked registers, it was not correctly setting the upper 16bits.
> Differently than i915, the entry itself doesn't set the upper bits
> for masked registers: this is done when applying them. Testing on ADL-P:
> 
> Before:
> 	[drm:xe_gt_record_default_lrcs [xe]] LRC WA rcs0 save-restore MMIOs
> 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x2580] = 0x00000002
> 	...
> 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7018] = 0x00002000
> 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7300] = 0x00000040
> 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7304] = 0x00000200
> 
> After:
> 	[drm:xe_gt_record_default_lrcs [xe]] LRC WA rcs0 save-restore MMIOs
> 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x2580] = 0x00060002
> 	...
> 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7018] = 0x20002000
> 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7300] = 0x00400040
> 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x7304] = 0x02000200
> 
> All of these registers are masked registers, so writing to them without
> the relevant bits in the upper 16b doesn't have any effect.
> 
> Also, this adds support to regular registers; previously it was assumed
> that LRC entries would only contain masked registers. However this is
> not true. 0x6604 is not a masked register, but used in workarounds for
> e.g.  ADL-P. See commit 28cf243a341a ("drm/i915/gt: Fix context
> workarounds with non-masked regs"). In the same test with ADL-P as
> above:
> 
> Before:
> 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x6604] = 0xe0000000
> After:
> 	[drm:xe_gt_record_default_lrcs [xe]] REG[0x6604] = 0xe0efef6f
> 
> As can be seen, now it will read what was in the register rather than
> completely overwrite the other bits.
> 
> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> ---
>  drivers/gpu/drm/xe/xe_gt.c | 42 +++++++++++++++++++++++++++++++-------
>  1 file changed, 35 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index bd307770a620..6dd3cb5d8246 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -114,12 +114,21 @@ static int emit_nop_job(struct xe_gt *gt, struct xe_exec_queue *q)
>  	return 0;
>  }
>  
> +/*
> + * Convert back from encoded value to type-safe, only to be used when reg.mcr
> + * is true
> + */
> +static struct xe_reg_mcr to_xe_reg_mcr(const struct xe_reg reg)
> +{
> +	return (const struct xe_reg_mcr){.__reg.raw = reg.raw };
> +}
> +
>  static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
>  {
>  	struct xe_device *xe = gt_to_xe(gt);
>  	struct xe_reg_sr *sr = &q->hwe->reg_lrc;
>  	struct xe_reg_sr_entry *entry;
> -	unsigned long reg;
> +	unsigned long idx;
>  	struct xe_sched_job *job;
>  	struct xe_bb *bb;
>  	struct dma_fence *fence;
> @@ -130,18 +139,37 @@ static int emit_wa_job(struct xe_gt *gt, struct xe_exec_queue *q)
>  	if (IS_ERR(bb))
>  		return PTR_ERR(bb);
>  
> -	xa_for_each(&sr->xa, reg, entry)
> +	xa_for_each(&sr->xa, idx, entry)
>  		++count;
>  
>  	if (count) {
>  		drm_dbg(&xe->drm, "LRC WA %s save-restore MMIOs\n", sr->name);
>  
>  		bb->cs[bb->len++] = MI_LOAD_REGISTER_IMM(count);
> -		xa_for_each(&sr->xa, reg, entry) {
> -			bb->cs[bb->len++] = reg;
> -			bb->cs[bb->len++] = entry->set_bits;
> -			drm_dbg(&xe->drm, "REG[0x%lx] = 0x%08x", reg,
> -				entry->set_bits);
> +
> +		xa_for_each(&sr->xa, idx, entry) {
> +			struct xe_reg reg = entry->reg;
> +			struct xe_reg_mcr reg_mcr = to_xe_reg_mcr(reg);
> +			u32 val;
> +
> +			bb->cs[bb->len++] = reg.addr;
> +
> +			/*
> +			 * Skip reading the register if it's not really needed
> +			 */
> +			if (reg.masked)
> +				val = entry->clr_bits << 16;
> +			else if (entry->clr_bits + 1)
> +				val = (reg.mcr ?
> +				       xe_gt_mcr_unicast_read_any(gt, reg_mcr) :
> +				       xe_mmio_read32(gt, reg)) & (~entry->clr_bits);

I'm a little bit worried about whether this will continue to work in the
long term.  There are definitely some registers out there that we can't
read properly from the CPU.  Maybe none of our current LRC workarounds
target such registers, but they do exist (I have a vague memory of
hitting this on some old pre-production PVC LRC workarounds I think).
We can probably go with this CPU read for now, but maybe we should leave
a comment here that we might need to switch to a GPU-based approach at
some point in the future if we run into CPU inaccessible registers.

If we wind up needing to change the approach down the road, I think
there are probably a handful of possible approaches:

 * Save the default_lrc without applying the LRC workarounds initially.
   Read the in-memory contents of the LRC to find the original register
   value.  Then submit the LRC batch and re-save the LRC with the
   workarounds applied.
 * Submit a preliminary batchbuffer with MI_SRM instructions to save the
   hw-default register values to a hwsp location that we can then pick
   up and use when crafting the MI_LRI batch.  Or update in memory and
   use MI_LRM instead.
 * Use MI_MATH to perform the desired bitwise operations on register
   bits directly.

Anyway, we can deal with that later if/when it becomes necessary.  For
now,

Reviewed-by: Matt Roper <matthew.d.roper at intel.com>


Matt

> +			else
> +				val = 0;
> +
> +			val |= entry->set_bits;
> +			bb->cs[bb->len++] = val;
> +
> +			drm_dbg(&xe->drm, "REG[0x%x] = 0x%08x", reg.addr, val);
>  		}
>  	}
>  
> -- 
> 2.40.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation


More information about the Intel-xe mailing list