[PATCH] drm/amdgpu: Fix desktop freezed after gpu-reset

Luben Tuikov luben.tuikov at amd.com
Mon Mar 27 19:00:21 UTC 2023


Hi,

That's a good fix. Some questions and comments below:

On 2023-03-27 11:20, Alan Liu wrote:
> [Why]
> After gpu-reset, sometimes the driver would fail to enable vblank irq,
> causing flip_done timed out and the desktop freezed.
> 
> During gpu-reset, we will disable and enable vblank irq in dm_suspend()
> and dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we
> will check irqs' refcount and decide to enable or disable the irqs again.
> 
> However, we have 2 sets of API for controling vblank irq, one is
> dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has
> its own refcount and flag to store the state of vblank irq, and they
> are not synchronized.

Is it possible to reconcile controlling VBlank IRQ to a single refcount?

> 
> In drm we use the first API to control vblank irq but in
> amdgpu_irq_gpu_reset_resume_helper() we use the second set of API.
> 
> The failure happens when vblank irq was enabled by dm_vblank_get() before
> gpu-reset, we have vblank->enabled true. However, during gpu-reset, in
> amdgpu_irq_gpu_reset_resume_helper(), vblank irq's state checked from
> amdgpu_irq_update() is DISABLED. So finally it will disable vblank irq
> again. After gpu-reset, if there is a cursor plane commit, the driver
> will try to enable vblank irq by calling drm_vblank_enable(), but the
> vblank->enabled is still true, so it fails to turn on vblank irq and
> causes flip_done can't be completed in vblank irq handler and desktop
> become freezed.
> 
> [How]
> Combining the 2 vblank control APIs by letting drm's API finally calls
> amdgpu_irq's API, so the irq's refcount and state of both APIs can be
> synchronized. Also add a check to prevent refcount from being less then
> 0 in amdgpu_irq_put().
> 
> Signed-off-by: Alan Liu <HaoPing.Liu at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c            |  3 +++
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 14 ++++++++++----
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index a6aef488a822..1b66003657e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -597,6 +597,9 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
>  	if (!src->enabled_types || !src->funcs->set)
>  		return -EINVAL;
>  
> +	if (!amdgpu_irq_enabled(adev, src, type))
> +		return 0;
> +
>  	if (atomic_dec_and_test(&src->enabled_types[type]))
>  		return amdgpu_irq_update(adev, src, type);
>  
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> index dc4f37240beb..e04f846b0b19 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
> @@ -146,7 +146,7 @@ static void vblank_control_worker(struct work_struct *work)
>  
>  static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
>  {
> -	enum dc_irq_source irq_source;
> +	int irq_type;
>  	struct amdgpu_crtc *acrtc = to_amdgpu_crtc(crtc);
>  	struct amdgpu_device *adev = drm_to_adev(crtc->dev);
>  	struct dm_crtc_state *acrtc_state = to_dm_crtc_state(crtc->state);
> @@ -169,10 +169,16 @@ static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
>  	if (rc)
>  		return rc;
>  
> -	irq_source = IRQ_TYPE_VBLANK + acrtc->otg_inst;
> +	irq_type = amdgpu_display_crtc_idx_to_irq_type(adev, acrtc->crtc_id);
> +
> +	if (enable)
> +		rc = amdgpu_irq_get(adev, &adev->crtc_irq, irq_type);
> +
> +	else

There's an unnecessary empty line before the "else". It's a good idea
to run patches through scripts/checkpatch.pl.

> +		rc = amdgpu_irq_put(adev, &adev->crtc_irq, irq_type);
>  
> -	if (!dc_interrupt_set(adev->dm.dc, irq_source, enable))
> -		return -EBUSY;
> +	if (rc)
> +		return rc;
>  
>  skip:
>  	if (amdgpu_in_reset(adev))

-- 
Regards,
Luben



More information about the amd-gfx mailing list