[PATCH 4/4] drm/xe/guc: Don't treat GuC generic CAT error as protocol error
Michal Wajdeczko
michal.wajdeczko at intel.com
Tue Nov 5 20:13:43 UTC 2024
On 05.11.2024 18:30, Michal Wajdeczko wrote:
> GuC uses GUC_ID_MAX if it can not map the CAT fault to any PF/VF
> context. We shouldn't treat that as G2H protocol error that would
> justify a GT reset, as it may happen due to some VF activity.
>
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko at intel.com>
> Cc: Matthew Brost <matthew.brost at intel.com>
> ---
> drivers/gpu/drm/xe/xe_guc_submit.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
> index 147000fd1177..696f8884040a 100644
> --- a/drivers/gpu/drm/xe/xe_guc_submit.c
> +++ b/drivers/gpu/drm/xe/xe_guc_submit.c
> @@ -2023,6 +2023,15 @@ int xe_guc_exec_queue_memory_cat_error_handler(struct xe_guc *guc, u32 *msg,
>
> guc_id = msg[0];
>
> + if (guc_id == GUC_ID_MAX) {
oops, this is actually wrong as GUC_ID_MAX is defined as 65535 which is
16bit while we should look for 32b value 0xffffffff
> + /*
> + * GuC uses GUC_ID_MAX if it can not map the CAT fault to any PF/VF
> + * context. Only PF will be notified about that.
> + */
> + xe_gt_err_ratelimited(gt, "Memory CAT error reported by GuC!\n");
> + return 0;
> + }
> +
> q = g2h_exec_queue_lookup(guc, guc_id);
> if (unlikely(!q))
> return -EPROTO;
More information about the Intel-xe
mailing list