[PATCH v2 4/4] drm/xe/guc: Don't treat GuC generic CAT error as protocol error

Matthew Brost matthew.brost at intel.com
Wed Nov 6 01:24:57 UTC 2024


On Tue, Nov 05, 2024 at 09:45:57PM +0100, Michal Wajdeczko wrote:
> GuC uses GUC_ID_UNKNOWN if it can not map the CAT fault to any
> context. We shouldn't treat that as G2H protocol error that would
> justify a GT reset, as it may happen due to some VF activity.
> 
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>

Reviewed-by: Matthew Brost <matthew.brost at intel.com>

> ---
> v2: replaced 16b GUC_ID_MAX with 32b GUC_ID_UNKNOWN
> ---
>  drivers/gpu/drm/xe/xe_guc_fwif.h   | 1 +
>  drivers/gpu/drm/xe/xe_guc_submit.c | 9 +++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_fwif.h b/drivers/gpu/drm/xe/xe_guc_fwif.h
> index 08ffe59f22fa..057153f89b30 100644
> --- a/drivers/gpu/drm/xe/xe_guc_fwif.h
> +++ b/drivers/gpu/drm/xe/xe_guc_fwif.h
> @@ -17,6 +17,7 @@
>  #define G2H_LEN_DW_TLB_INVALIDATE		3
>  
>  #define GUC_ID_MAX			65535
> +#define GUC_ID_UNKNOWN			0xffffffff
>  
>  #define GUC_CONTEXT_DISABLE		0
>  #define GUC_CONTEXT_ENABLE		1
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 147000fd1177..614110dbb527 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -2023,6 +2023,15 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>  
>  	guc_id = msg[0];
>  
> +	if (guc_id == GUC_ID_UNKNOWN) {
> +		/*
> +		 * GuC uses GUC_ID_UNKNOWN if it can not map the CAT fault to any PF/VF
> +		 * context. In such case only PF will be notified about that fault.
> +		 */
> +		xe_gt_err_ratelimited(gt, "Memory CAT error reported by GuC!\n");
> +		return 0;
> +	}
> +
>  	q = g2h_exec_queue_lookup(guc, guc_id);
>  	if (unlikely(!q))
>  		return -EPROTO;
> -- 
> 2.43.0
> 


More information about the Intel-xe mailing list