[PATCH] drm/xe/guc: Add more GuC load error status codes

John Harrison john.c.harrison at intel.com
Sat Jul 26 00:19:37 UTC 2025


On 7/25/2025 12:29 PM, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Intel-xe <intel-xe-bounces at lists.freedesktop.org> On Behalf Of John.C.Harrison at Intel.com
> Sent: Friday, July 25, 2025 12:12 PM
> To: Intel-Xe at Lists.FreeDesktop.Org
> Cc: Harrison, John C <john.c.harrison at intel.com>
> Subject: [PATCH] drm/xe/guc: Add more GuC load error status codes
>> From: John Harrison <John.C.Harrison at Intel.com>
>>
>> The GuC load process will abort if certain status codes (which are
>> indicative of a fatal error) are reported. Otherwise, it keeps waiting
>> until the 'success' code is returned. New error codes have been added
>> in recent GuC releases, so add support for aborting on those as well.
>>
>> Signed-off-by: John Harrison <John.C.Harrison at Intel.com>
>> ---
>>   drivers/gpu/drm/xe/abi/guc_errors_abi.h |  3 +++
>>   drivers/gpu/drm/xe/xe_guc.c             | 11 +++++++++++
>>   2 files changed, 14 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/abi/guc_errors_abi.h b/drivers/gpu/drm/xe/abi/guc_errors_abi.h
>> index ecf748fd87df..ad76b4baf42e 100644
>> --- a/drivers/gpu/drm/xe/abi/guc_errors_abi.h
>> +++ b/drivers/gpu/drm/xe/abi/guc_errors_abi.h
>> @@ -63,6 +63,7 @@ enum xe_guc_load_status {
>>   	XE_GUC_LOAD_STATUS_HWCONFIG_START                   = 0x05,
>>   	XE_GUC_LOAD_STATUS_HWCONFIG_DONE                    = 0x06,
>>   	XE_GUC_LOAD_STATUS_HWCONFIG_ERROR                   = 0x07,
>> +	XE_GUC_LOAD_STATUS_BOOTROM_VERSION_MISMATCH         = 0x08,
>>   	XE_GUC_LOAD_STATUS_GDT_DONE                         = 0x10,
>>   	XE_GUC_LOAD_STATUS_IDT_DONE                         = 0x20,
>>   	XE_GUC_LOAD_STATUS_LAPIC_DONE                       = 0x30,
>> @@ -75,6 +76,8 @@ enum xe_guc_load_status {
>>   	XE_GUC_LOAD_STATUS_INVALID_INIT_DATA_RANGE_START,
>>   	XE_GUC_LOAD_STATUS_MPU_DATA_INVALID                 = 0x73,
>>   	XE_GUC_LOAD_STATUS_INIT_MMIO_SAVE_RESTORE_INVALID   = 0x74,
>> +	XE_GUC_LOAD_STATUS_KLV_WORKAROUND_INIT_ERROR        = 0x75,
>> +	XE_GUC_LOAD_STATUS_INVALID_FTR_FLAG                 = 0x76,
>>   	XE_GUC_LOAD_STATUS_INVALID_INIT_DATA_RANGE_END,
>>   
>>   	XE_GUC_LOAD_STATUS_READY                            = 0xF0,
>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
>> index 1ca7f4f27e26..315ef74bb5ff 100644
>> --- a/drivers/gpu/drm/xe/xe_guc.c
>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>> @@ -992,11 +992,14 @@ static int guc_load_done(u32 status)
>>   	case XE_GUC_LOAD_STATUS_GUC_PREPROD_BUILD_MISMATCH:
>>   	case XE_GUC_LOAD_STATUS_ERROR_DEVID_INVALID_GUCTYPE:
>>   	case XE_GUC_LOAD_STATUS_HWCONFIG_ERROR:
>> +	case XE_GUC_LOAD_STATUS_BOOTROM_VERSION_MISMATCH:
>>   	case XE_GUC_LOAD_STATUS_DPC_ERROR:
>>   	case XE_GUC_LOAD_STATUS_EXCEPTION:
>>   	case XE_GUC_LOAD_STATUS_INIT_DATA_INVALID:
>>   	case XE_GUC_LOAD_STATUS_MPU_DATA_INVALID:
>>   	case XE_GUC_LOAD_STATUS_INIT_MMIO_SAVE_RESTORE_INVALID:
>> +	case XE_GUC_LOAD_STATUS_KLV_WORKAROUND_INIT_ERROR:
>> +	case XE_GUC_LOAD_STATUS_INVALID_FTR_FLAG:
>>   		return -1;
>>   	}
>>   
>> @@ -1146,6 +1149,14 @@ static void guc_wait_ucode(struct xe_guc *guc)
>>   			xe_gt_err(gt, "illegal register in save/restore workaround list\n");
>>   			break;
>>   
>> +		case XE_GUC_LOAD_STATUS_KLV_WORKAROUND_INIT_ERROR:
>> +			xe_gt_err(gt, "illegal workaround KLV data\n");
>> +			break;
>> +
>> +		case XE_GUC_LOAD_STATUS_INVALID_FTR_FLAG:
>> +			xe_gt_err(gt, "illegal feature flag specified\n");
>> +			break;
>> +
> Given that XE_GUC_LOAD_STATUS_KLV_WORKAROUND_INIT_ERROR and
> XE_GUC_LOAD_STATUS_INVALID_FTR_FLAG are indexed after
> XE_GUC_LOAD_STATUS_HWCONFIG_START in the xe_guc_load_status enum list,
> they should also probably be placed after XE_GUC_LOAD_STATUS_HWCONFIG_START
> in this case switch branch, but I'm not going to block on it because the ordering
> probably isn't that important.
Actually in terms of enum value, HWCONFIG_START is before all the other 
existing cases as well. The ordering of cases within a switch is 
entirely down to coder preference - whatever makes the code more 
readable/efficient. E.g. grouping logical values, allowing values to 
fall into subsequent cases, etc. even though their absolute values might 
be wildly scattered. However, I just realised the basic 'invalid init 
data' code is missing an explanation too. So, I can move the hwconfig to 
the start in the respin easily enough.

John.

>
> Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> -Jonathan Cavitt
>
>>   		case XE_GUC_LOAD_STATUS_HWCONFIG_START:
>>   			xe_gt_err(gt, "still extracting hwconfig table.\n");
>>   			break;
>> -- 
>> 2.49.0
>>
>>



More information about the Intel-xe mailing list