[PATCH] drm/amdgpu: Fix desktop freezed after gpu-reset

Alan Liu HaoPing.Liu at amd.com
Mon Mar 27 15:20:28 UTC 2023


[Why]
After gpu-reset, sometimes the driver would fail to enable vblank irq,
causing flip_done timed out and the desktop freezed.

During gpu-reset, we will disable and enable vblank irq in dm_suspend()
and dm_resume(). Later on in amdgpu_irq_gpu_reset_resume_helper(), we
will check irqs' refcount and decide to enable or disable the irqs again.

However, we have 2 sets of API for controling vblank irq, one is
dm_vblank_get/put() and another is amdgpu_irq_get/put(). Each API has
its own refcount and flag to store the state of vblank irq, and they
are not synchronized.

In drm we use the first API to control vblank irq but in
amdgpu_irq_gpu_reset_resume_helper() we use the second set of API.

The failure happens when vblank irq was enabled by dm_vblank_get() before
gpu-reset, we have vblank->enabled true. However, during gpu-reset, in
amdgpu_irq_gpu_reset_resume_helper(), vblank irq's state checked from
amdgpu_irq_update() is DISABLED. So finally it will disable vblank irq
again. After gpu-reset, if there is a cursor plane commit, the driver
will try to enable vblank irq by calling drm_vblank_enable(), but the
vblank->enabled is still true, so it fails to turn on vblank irq and
causes flip_done can't be completed in vblank irq handler and desktop
become freezed.

[How]
Combining the 2 vblank control APIs by letting drm's API finally calls
amdgpu_irq's API, so the irq's refcount and state of both APIs can be
synchronized. Also add a check to prevent refcount from being less then
0 in amdgpu_irq_put().

Signed-off-by: Alan Liu <HaoPing.Liu at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c            |  3 +++
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 14 ++++++++++----
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index a6aef488a822..1b66003657e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -597,6 +597,9 @@ int amdgpu_irq_put(struct amdgpu_device *adev, struct amdgpu_irq_src *src,
 	if (!src->enabled_types || !src->funcs->set)
 		return -EINVAL;
 
+	if (!amdgpu_irq_enabled(adev, src, type))
+		return 0;
+
 	if (atomic_dec_and_test(&src->enabled_types[type]))
 		return amdgpu_irq_update(adev, src, type);
 
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
index dc4f37240beb..e04f846b0b19 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
@@ -146,7 +146,7 @@ static void vblank_control_worker(struct work_struct *work)
 
 static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
 {
-	enum dc_irq_source irq_source;
+	int irq_type;
 	struct amdgpu_crtc *acrtc = to_amdgpu_crtc(crtc);
 	struct amdgpu_device *adev = drm_to_adev(crtc->dev);
 	struct dm_crtc_state *acrtc_state = to_dm_crtc_state(crtc->state);
@@ -169,10 +169,16 @@ static inline int dm_set_vblank(struct drm_crtc *crtc, bool enable)
 	if (rc)
 		return rc;
 
-	irq_source = IRQ_TYPE_VBLANK + acrtc->otg_inst;
+	irq_type = amdgpu_display_crtc_idx_to_irq_type(adev, acrtc->crtc_id);
+
+	if (enable)
+		rc = amdgpu_irq_get(adev, &adev->crtc_irq, irq_type);
+
+	else
+		rc = amdgpu_irq_put(adev, &adev->crtc_irq, irq_type);
 
-	if (!dc_interrupt_set(adev->dm.dc, irq_source, enable))
-		return -EBUSY;
+	if (rc)
+		return rc;
 
 skip:
 	if (amdgpu_in_reset(adev))
-- 
2.34.1



More information about the amd-gfx mailing list