[PATCH 2/2] drm/xe/guc: Support crash dump notification from GuC
John Harrison
john.c.harrison at intel.com
Fri Nov 8 23:51:12 UTC 2024
On 11/8/2024 15:35, Matthew Brost wrote:
> On Fri, Nov 08, 2024 at 01:27:37PM -0800, John.C.Harrison at Intel.com wrote:
>> From: John Harrison <John.C.Harrison at Intel.com>
>>
>> Add support for the two crash dump notifications from GuC. Either one
>> means GuC is toast, so just capture state trigger a reset.
>>
>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_guc_ct.c | 23 +++++++++++++++++++++++
>> 1 file changed, 23 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
>> index 63bd91963eb1..7eb175a0b874 100644
>> --- a/drivers/gpu/drm/xe/xe_guc_ct.c
>> +++ b/drivers/gpu/drm/xe/xe_guc_ct.c
>> @@ -54,6 +54,7 @@ enum {
>> CT_DEAD_PARSE_G2H_UNKNOWN, /* 0x1000 */
>> CT_DEAD_PARSE_G2H_ORIGIN, /* 0x2000 */
>> CT_DEAD_PARSE_G2H_TYPE, /* 0x4000 */
>> + CT_DEAD_CRASH, /* 0x8000 */
>> };
>>
>> static void ct_dead_worker_func(struct work_struct *w);
>> @@ -1120,6 +1121,24 @@ static int parse_g2h_event(struct xe_guc_ct *ct, u32 *msg, u32 len)
>> return 0;
>> }
>>
>> +static int guc_crash_process_msg(struct xe_guc_ct *ct, u32 action)
>> +{
>> + struct xe_gt *gt = ct_to_gt(ct);
>> +
>> + if (action == XE_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED)
>> + xe_gt_err(gt, "GuC Crash dump notification\n");
>> + else if (action == XE_GUC_ACTION_NOTIFY_EXCEPTION)
>> + xe_gt_err(gt, "GuC Exception notification\n");
>> + else
>> + xe_gt_err(gt, "Unknown GuC crash notification: 0x%04X\n", action);
>> +
>> + CT_DEAD(ct, NULL, CRASH);
>> +
>> + kick_reset(ct);
> Side note, we may want to wire a devcoredump to a GT reset too.
I have a work-in-progress series to allow creating a devcoredump without
a scheduler job. I assume that would be a re-requisite to creating one
from an arbitrary GT reset. Certainly coming in from an async event such
as this, there is no scheduler job to use. Hoping to post that soon.
Should be easy enough to connect it to the GT reset then.
John.
>
> Anyways this patch LGTM. With that:
> Reviewed-by: Matthew Brost <matthew.brost at intel.com>
>
>> +
>> + return 0;
>> +}
>> +
>> static int parse_g2h_response(struct xe_guc_ct *ct, u32 *msg, u32 len)
>> {
>> struct xe_gt *gt = ct_to_gt(ct);
>> @@ -1294,6 +1313,10 @@ static int process_g2h_msg(struct xe_guc_ct *ct, u32 *msg, u32 len)
>> case GUC_ACTION_GUC2PF_ADVERSE_EVENT:
>> ret = xe_gt_sriov_pf_monitor_process_guc2pf(gt, hxg, hxg_len);
>> break;
>> + case XE_GUC_ACTION_NOTIFY_CRASH_DUMP_POSTED:
>> + case XE_GUC_ACTION_NOTIFY_EXCEPTION:
>> + ret = guc_crash_process_msg(ct, action);
>> + break;
>> default:
>> xe_gt_err(gt, "unexpected G2H action 0x%04x\n", action);
>> }
>> --
>> 2.47.0
>>
More information about the Intel-xe
mailing list