[PATCH v3 7/7] drm/amdgpu: Stop any pending reset if another in progress.

Tue May 31 15:31:26 UTC 2022

Am 2022-05-25 um 15:04 schrieb Andrey Grodzovsky:
> We skip rest requests if another one is already in progress.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 27 ++++++++++++++++++++++
>   1 file changed, 27 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 424571e46cf5..e1f7ee604ea4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5054,6 +5054,27 @@ static void amdgpu_device_recheck_guilty_jobs(
>   	}
>   }
>   
> +static inline void amdggpu_device_stop_pedning_resets(struct amdgpu_device* adev)

Typo: pedning -> pending

> +{
> +	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
> +
> +#if defined(CONFIG_DEBUG_FS)
> +	if (!amdgpu_sriov_vf(adev))
> +		cancel_work(&adev->reset_work);
> +#endif
> +
> +	if (adev->kfd.dev)
> +		cancel_work(&adev->kfd.reset_work);

Do you also need to cancel resets from other GPUs in the same hive?

Regards,
   Felix

> +
> +	if (amdgpu_sriov_vf(adev))
> +		cancel_work(&adev->virt.flr_work);
> +
> +	if (con && adev->ras_enabled)
> +		cancel_work(&con->recovery_work);
> +
> +}
> +
> +
>   /**
>    * amdgpu_device_gpu_recover - reset the asic and recover scheduler
>    *
> @@ -5209,6 +5230,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
>   				  r, adev_to_drm(tmp_adev)->unique);
>   			tmp_adev->asic_reset_res = r;
>   		}
> +
> +		/*
> +		 * Drop all pending non scheduler resets. Scheduler resets
> +		 * were already dropped during drm_sched_stop
> +		 */
> +		amdggpu_device_stop_pedning_resets(tmp_adev);
>   	}
>   
>   	tmp_vram_lost_counter = atomic_read(&((adev)->vram_lost_counter));