[PATCH v2] drm/xe/vf: Fail migration recovery if fixups needed but platform not supported
Michal Wajdeczko
michal.wajdeczko at intel.com
Thu May 15 09:51:34 UTC 2025
On 15.05.2025 00:14, Tomasz Lis wrote:
> The post-migration recovery needs to be fully implemented for a
> specific platform in order to make continuation of workloads
> possible.
>
> New platforms introduce changes which affect the recovery procedure,
> and without a clear verification of support this leads to errors
> with no straight forward error message explaining the cause.
>
> This patch fixes that issue - it introduces a message to be logged
> when the current driver is known to not support the current platform.
>
> Wedging the driver immediately also decreases the amount of
> additional errors which would come afterwards if the driver continued
> operation.
>
> v2: Show the message during probe as well as during recovery; do not
> perform any recovery steps if the reovery is bound to fail
typo reovery
>
> Signed-off-by: Tomasz Lis <tomasz.lis at intel.com>
> ---
> drivers/gpu/drm/xe/xe_sriov_vf.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/drivers/gpu/drm/xe/xe_sriov_vf.c b/drivers/gpu/drm/xe/xe_sriov_vf.c
> index 2674fa948fda..bb069e4b3823 100644
> --- a/drivers/gpu/drm/xe/xe_sriov_vf.c
> +++ b/drivers/gpu/drm/xe/xe_sriov_vf.c
> @@ -123,6 +123,15 @@
> * | | |
> */
>
> +static bool vf_mgration_supported(struct xe_device *xe)
typo mgration
> +{
> + /*
> + * TODO: Add conditions to allow specific platforms, when they're
> + * supported at production quality.
> + */
> + return IS_ENABLED(CONFIG_DRM_XE_DEBUG_SRIOV);
maybe in case it's not supported we can print the message right here so
we can avoid two similar calls later on?
> +}
> +
> static void migration_worker_func(struct work_struct *w);
>
> /**
> @@ -132,6 +141,10 @@ static void migration_worker_func(struct work_struct *w);
> void xe_sriov_vf_init_early(struct xe_device *xe)
> {
> INIT_WORK(&xe->sriov.vf.migration.worker, migration_worker_func);
> +
> + if (!vf_mgration_supported(xe)) {
> + drm_info(&xe->drm, "vf migration not supported by this module version\n");
s/vf//VF
or maybe just use xe_sriov_info() which will add "VF: " prefix for you
> + }
> }
>
> /**
> @@ -236,6 +249,11 @@ static void vf_post_migration_recovery(struct xe_device *xe)
> goto defer;
> if (unlikely(err))
> goto fail;
> + if (!vf_mgration_supported(xe)) {
> + drm_err(&xe->drm, "vf migration not supported by this module version\n");
same here (or print inside vf_migration_supported)
> + err = -ENOTRECOVERABLE;
> + goto fail;
> + }
>
> need_fixups = vf_post_migration_fixup_ggtt_nodes(xe);
> /* FIXME: add the recovery steps */
More information about the Intel-xe
mailing list