[Intel-xe] [PATCH V3] drm/xe/xe2: Add workaround 16020292621

Thu Nov 30 17:20:33 UTC 2023

On Thu, Nov 30, 2023 at 05:14:41PM +0530, Tejas Upadhyay wrote:
> Workaround applies to Graphics 20.04 as part of ring
> submission
> 
> V3(MattR):
>   - Pass hwe and rename API name to hint end of ring work
>   - Use existing RING_NOPID API
> V2:
>   - Marking this WA for 20.04 instead of 20.00
> 
> Signed-off-by: Tejas Upadhyay <tejas.upadhyay at intel.com>
> ---
>  drivers/gpu/drm/xe/regs/xe_gpu_commands.h |  1 +
>  drivers/gpu/drm/xe/xe_ring_ops.c          | 17 +++++++++++++++++
>  drivers/gpu/drm/xe/xe_wa_oob.rules        |  1 +
>  3 files changed, 19 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> index 4402f72481dc..f1c5bf203b3d 100644
> --- a/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> +++ b/drivers/gpu/drm/xe/regs/xe_gpu_commands.h
> @@ -48,6 +48,7 @@
>  #define   PIPE_CONTROL_TILE_CACHE_FLUSH			(1<<28)
>  #define   PIPE_CONTROL_AMFS_FLUSH			(1<<25)
>  #define   PIPE_CONTROL_GLOBAL_GTT_IVB			(1<<24)
> +#define   PIPE_CONTROL_LRI_POST_SYNC			BIT(23)
>  #define   PIPE_CONTROL_STORE_DATA_INDEX			(1<<21)
>  #define   PIPE_CONTROL_CS_STALL				(1<<20)
>  #define   PIPE_CONTROL_GLOBAL_SNAPSHOT_RESET		(1<<19)
> diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
> index 59e0aa2d6a4c..f69279f716d6 100644
> --- a/drivers/gpu/drm/xe/xe_ring_ops.c
> +++ b/drivers/gpu/drm/xe/xe_ring_ops.c
> @@ -7,6 +7,7 @@
>  
>  #include "generated/xe_wa_oob.h"
>  #include "instructions/xe_mi_commands.h"
> +#include "regs/xe_engine_regs.h"
>  #include "regs/xe_gpu_commands.h"
>  #include "regs/xe_gt_regs.h"
>  #include "regs/xe_lrc_layout.h"
> @@ -184,6 +185,20 @@ static int emit_render_cache_flush(struct xe_sched_job *job, u32 *dw, int i)
>  	return i;
>  }
>  
> +static int emit_pipe_control_to_ring_end(struct xe_hw_engine *hwe, u32 *dw, int i)
> +{
> +	if (XE_WA(hwe->gt, 16020292621)) {
> +		dw[i++] = GFX_OP_PIPE_CONTROL(6);
> +		dw[i++] = PIPE_CONTROL_LRI_POST_SYNC;
> +		dw[i++] = RING_NOPID(hwe->mmio_base).addr;
> +		dw[i++] = 0;
> +		dw[i++] = 0;
> +		dw[i++] = 0;
> +	}
> +
> +	return i;
> +}
> +
>  static int emit_pipe_imm_ggtt(u32 addr, u32 value, bool stall_only, u32 *dw,
>  			      int i)
>  {
> @@ -342,6 +357,8 @@ static void __emit_job_gen12_render_compute(struct xe_sched_job *job,
>  
>  	i = emit_user_interrupt(dw, i);
>  
> +	i = emit_pipe_control_to_ring_end(job->q->hwe, dw, i);
> +
>  	xe_gt_assert(gt, i <= MAX_JOB_SIZE_DW);
>  
>  	xe_lrc_write_ring(lrc, dw, i * sizeof(*dw));
> diff --git a/drivers/gpu/drm/xe/xe_wa_oob.rules b/drivers/gpu/drm/xe/xe_wa_oob.rules
> index c7b7d40b5d57..d8fa1e958dc3 100644
> --- a/drivers/gpu/drm/xe/xe_wa_oob.rules
> +++ b/drivers/gpu/drm/xe/xe_wa_oob.rules
> @@ -21,3 +21,4 @@
>  22010954014	PLATFORM(DG2)
>  14019821291	MEDIA_VERSION_RANGE(1300, 2000)
>  14015076503	MEDIA_VERSION(1300)
> +16020292621	GRAPHICS_VERSION(2004), GRAPHICS_STEP(A0, B0), ENGINE_CLASS(RENDER)

I don't think ENGINE_CLASS() works on OOB workarounds.  The workarounds
get applied with XE_WA() which only takes a GT pointer, no reference to
any specific engine to compare against.  And xe_wa_process_oob() that
figures out which OOB workarounds are relevant during initialization
also uses a GT context, not an engine context.

For now, it's probably best to just add an explicit check for engine
class to the callsite.  If we encounter more engine-centric OOB
workarounds like this in the future, it might be worth extending the OOB
framework to support that down the road.

I wonder if we should add some checks to the OOB handling to catch any
"illegal" rules like this?

Matt

> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation