[PATCH v2] drm/xe/vf: Fail migration recovery if fixups needed but platform not supported

Lis, Tomasz tomasz.lis at intel.com
Thu May 15 11:04:39 UTC 2025


On 15.05.2025 11:51, Michal Wajdeczko wrote:
>
> On 15.05.2025 00:14, Tomasz Lis wrote:
>> The post-migration recovery needs to be fully implemented for a
>> specific platform in order to make continuation of workloads
>> possible.
>>
>> New platforms introduce changes which affect the recovery procedure,
>> and without a clear verification of support this leads to errors
>> with no straight forward error message explaining the cause.
>>
>> This patch fixes that issue - it introduces a message to be logged
>> when the current driver is known to not support the current platform.
>>
>> Wedging the driver immediately also decreases the amount of
>> additional errors which would come afterwards if the driver continued
>> operation.
>>
>> v2: Show the message during probe as well as during recovery; do not
>>    perform any recovery steps if the reovery is bound to fail
> typo reovery
ok
>
>> Signed-off-by: Tomasz Lis <tomasz.lis at intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_sriov_vf.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> index 2674fa948fda..bb069e4b3823 100644
>> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
>> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
>> @@ -123,6 +123,15 @@
>>    *      |                               |                               |
>>    */
>>   
>> +static bool vf_mgration_supported(struct xe_device *xe)
> typo mgration
ok
>
>> +{
>> +	/*
>> +	 * TODO: Add conditions to allow specific platforms, when they're
>> +	 * supported at production quality.
>> +	 */
>> +	return IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV);
> maybe in case it's not supported we can print the message right here so
> we can avoid two similar calls later on?
The idea behind this function is to be condition check function, it's 
not planned for having any action triggered.
>> +}
>> +
>>   static void migration_worker_func(struct work_struct *w);
>>   
>>   /**
>> @@ -132,6 +141,10 @@ static void migration_worker_func(struct work_struct *w);
>>   void xe_sriov_vf_init_early(struct xe_device *xe)
>>   {
>>   	INIT_WORK(&xe->sriov.vf.migration.worker, migration_worker_func);
>> +
>> +	if (!vf_mgration_supported(xe)) {
>> +		drm_info(&xe->drm, "vf migration not supported by this module version\n");
> s/vf//VF
>
> or maybe just use xe_sriov_info() which will add "VF: " prefix for you
>
>> +	}
>>   }
>>   
>>   /**
>> @@ -236,6 +249,11 @@ static void vf_post_migration_recovery(struct xe_device *xe)
>>   		goto defer;
>>   	if (unlikely(err))
>>   		goto fail;
>> +	if (!vf_mgration_supported(xe)) {
>> +		drm_err(&xe->drm, "vf migration not supported by this module version\n");
> same here (or print inside vf_migration_supported)

will change the printing funct.

-Tomasz

>
>> +		err = -ENOTRECOVERABLE;
>> +		goto fail;
>> +	}
>>   
>>   	need_fixups = vf_post_migration_fixup_ggtt_nodes(xe);
>>   	/* FIXME: add the recovery steps */


More information about the Intel-xe mailing list