[PATCH] drm/amdgpu: Fix desktop freezed after gpu-reset

Christian König ckoenig.leichtzumerken at gmail.com
Mon Apr 17 07:22:27 UTC 2023


Am 14.04.23 um 18:22 schrieb Alan Liu:
> [Why]
> After gpu-reset, sometimes the driver would fail to enable vblank irq,
> causing flip_done timed out and the desktop freezed.
>
> During gpu-reset, we will disable and enable vblank irq in dm_suspend()
> and dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we
> will check irqs' refcount and decide to enable or disable the irqs
> again.
>
> However, we have 2 sets of API for controling vblank irq, one is
> dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has
> its own refcount and flag to store the state of vblank irq, and they
> are not synchronized.
>
> In drm we use the first API to control vblank irq but in
> amdgpu_irq_gpu_reset_resume_helper() we use the second set of API.
>
> The failure happens when vblank irq was enabled by dm_vblank_get()
> before gpu-reset, we have vblank->enabled true. However, during
> gpu-reset, in amdgpu_irq_gpu_reset_resume_helper(), vblank irq's state
> checked from amdgpu_irq_update() is DISABLED. So finally it will disable
> vblank irq again. After gpu-reset, if there is a cursor plane commit,
> the driver will try to enable vblank irq by calling drm_vblank_enable(),
> but the vblank->enabled is still true, so it fails to turn on vblank
> irq and causes flip_done can't be completed in vblank irq handler and
> desktop become freezed.
>
> [How]
> Combining the 2 vblank control APIs by letting drm's API finally calls
> amdgpu_irq's API, so the irq's refcount and state of both APIs can be
> synchronized. Also add a check to prevent refcount from being less then
> 0 in amdgpu_irq_put().
>
> v2:
> - Add warning in amdgpu_irq_enable() if the irq is already disabled.
> - Call dc_interrupt_set() in dm_set_vblank() to avoid refcount change
>    if it is in gpu-reset.
>
> Signed-off-by: Alan Liu <HaoPing.Liu at amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c         |  3 +++
>   .../drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c  | 17 ++++++++++++++---
>   2 files changed, 17 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index a6aef488a822..c8413470e057 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -597,6 +597,9 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
>   	if (!src->enabled_types || !src->funcs->set)
>   		return -EINVAL;
>   
> +	if (WARN_ON(!amdgpu_irq_enabled(adev, src, type)))
> +		return -EINVAL;
> +
>   	if (atomic_dec_and_test(&src->enabled_types[type]))
>   		return amdgpu_irq_update(adev, src, type);
>   
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> index 1d924dc51a3e..514fefef109d 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> @@ -169,10 +169,21 @@ static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
>   	if (rc)
>   		return rc;
>   
> -	irq_source = IRQ_TYPE_VBLANK + acrtc->otg_inst;
> +	if (amdgpu_in_reset(adev)) {
> +		irq_source = IRQ_TYPE_VBLANK + acrtc->otg_inst;
> +		/* During gpu-reset we'll disable and then enable vblank irq,
> +		 * so don't use amdgpu_irq_get/put() to avoid refcount change.
> +		 */
> +		if (!dc_interrupt_set(adev->dm.dc, irq_source, enable))
> +			rc = -EBUSY;
> +	} else {
> +		rc = (enable)
> +			? amdgpu_irq_get(adev, &adev->crtc_irq, acrtc->crtc_id)
> +			: amdgpu_irq_put(adev, &adev->crtc_irq, acrtc->crtc_id);
> +	}

It would be cleaner if we could do this on a higher level, e.g. in the 
GPU reset code.

But for now that should do it and I think this could even be back-ported 
to stable.

Regards,
Christian.

>   
> -	if (!dc_interrupt_set(adev->dm.dc, irq_source, enable))
> -		return -EBUSY;
> +	if (rc)
> +		return rc;
>   
>   skip:
>   	if (amdgpu_in_reset(adev))



More information about the amd-gfx mailing list