[PATCH 2/2] drm/xe/guc: Support crash dump notification from GuC

Matthew Brost matthew.brost at intel.com
Fri Nov 8 23:35:18 UTC 2024


On Fri, Nov 08, 2024 at 01:27:37PM -0800, John.C.Harrison at Intel.com wrote:
> From: John Harrison <John.C.Harrison at Intel.com>
> 
> Add support for the two crash dump notifications from GuC. Either one
> means GuC is toast, so just capture state trigger a reset.
> 
> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
> ---
>  drivers/gpu/drm/xe/xe_guc_ct.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
> index 63bd91963eb1..7eb175a0b874 100644
> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
> @@ -54,6 +54,7 @@ enum {
>  	CT_DEAD_PARSE_G2H_UNKNOWN,		/* 0x1000 */
>  	CT_DEAD_PARSE_G2H_ORIGIN,		/* 0x2000 */
>  	CT_DEAD_PARSE_G2H_TYPE,			/* 0x4000 */
> +	CT_DEAD_CRASH,				/* 0x8000 */
>  };
>  
>  static void ct_dead_worker_func(struct work_struct *w);
> @@ -1120,6 +1121,24 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
>  	return 0;
>  }
>  
> +static int guc_crash_process_msg(struct xe_guc_ct *ct, u32 action)
> +{
> +	struct xe_gt *gt = ct_to_gt(ct);
> +
> +	if (action == XE_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED)
> +		xe_gt_err(gt, "GuC Crash dump notification\n");
> +	else if (action == XE_GUC_ACTION_NOTIFY_EXCEPTION)
> +		xe_gt_err(gt, "GuC Exception notification\n");
> +	else
> +		xe_gt_err(gt, "Unknown GuC crash notification: 0x%04X\n", action);
> +
> +	CT_DEAD(ct, NULL, CRASH);
> +
> +	kick_reset(ct);

Side note, we may want to wire a devcoredump to a GT reset too.

Anyways this patch LGTM. With that:
Reviewed-by: Matthew Brost <matthew.brost at intel.com>

> +
> +	return 0;
> +}
> +
>  static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
>  {
>  	struct xe_gt *gt =  ct_to_gt(ct);
> @@ -1294,6 +1313,10 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>  	case GUC_ACTION_GUC2PF_ADVERSE_EVENT:
>  		ret = xe_gt_sriov_pf_monitor_process_guc2pf(gt, hxg, hxg_len);
>  		break;
> +	case XE_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED:
> +	case XE_GUC_ACTION_NOTIFY_EXCEPTION:
> +		ret = guc_crash_process_msg(ct, action);
> +		break;
>  	default:
>  		xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
>  	}
> -- 
> 2.47.0
> 


More information about the Intel-xe mailing list