[PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset
Yu, Lang
Lang.Yu at amd.com
Tue Mar 23 01:42:36 UTC 2021
[AMD Official Use Only - Internal Distribution Only]
-----Original Message-----
From: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>
Sent: Monday, March 22, 2021 11:01 PM
To: Yu, Lang <Lang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Huang, Ray <Ray.Huang at amd.com>
Subject: Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset
On 2021-03-22 4:11 a.m., Lang Yu wrote:
> In amdggpu reset, while dm.dc_lock is held by dm_suspend,
> handle_hpd_rx_irq tries to acquire it. Deadlock occurred!
>
> Deadlock log:
>
> [ 104.528304] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
>
> [ 104.640084] ======================================================
> [ 104.640092] WARNING: possible circular locking dependency detected
> [ 104.640099] 5.11.0-custom #1 Tainted: G W E
> [ 104.640107] ------------------------------------------------------
> [ 104.640114] cat/1158 is trying to acquire lock:
> [ 104.640120] ffff88810a09ce00
> ((work_completion)(&lh->work)){+.+.}-{0:0}, at: __flush_work+0x2e3/0x450 [ 104.640144]
> but task is already holding lock:
> [ 104.640151] ffff88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at:
> dm_suspend+0xb2/0x1d0 [amdgpu] [ 104.640581]
> which lock already depends on the new lock.
>
> [ 104.640590]
> the existing dependency chain (in reverse order) is:
> [ 104.640598]
> -> #2 (&adev->dm.dc_lock){+.+.}-{3:3}:
> [ 104.640611] lock_acquire+0xca/0x390
> [ 104.640623] __mutex_lock+0x9b/0x930
> [ 104.640633] mutex_lock_nested+0x1b/0x20
> [ 104.640640] handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu]
> [ 104.640959] dm_irq_work_func+0x4e/0x60 [amdgpu]
> [ 104.641264] process_one_work+0x2a7/0x5b0
> [ 104.641275] worker_thread+0x4a/0x3d0
> [ 104.641283] kthread+0x125/0x160
> [ 104.641290] ret_from_fork+0x22/0x30
> [ 104.641300]
> -> #1 (&aconnector->hpd_lock){+.+.}-{3:3}:
> [ 104.641312] lock_acquire+0xca/0x390
> [ 104.641321] __mutex_lock+0x9b/0x930
> [ 104.641328] mutex_lock_nested+0x1b/0x20
> [ 104.641336] handle_hpd_rx_irq+0x67/0x1c0 [amdgpu]
> [ 104.641635] dm_irq_work_func+0x4e/0x60 [amdgpu]
> [ 104.641931] process_one_work+0x2a7/0x5b0
> [ 104.641940] worker_thread+0x4a/0x3d0
> [ 104.641948] kthread+0x125/0x160
> [ 104.641954] ret_from_fork+0x22/0x30
> [ 104.641963]
> -> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}:
> [ 104.641975] check_prev_add+0x94/0xbf0
> [ 104.641983] __lock_acquire+0x130d/0x1ce0
> [ 104.641992] lock_acquire+0xca/0x390
> [ 104.642000] __flush_work+0x303/0x450
> [ 104.642008] flush_work+0x10/0x20
> [ 104.642016] amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu]
> [ 104.642312] dm_suspend+0x181/0x1d0 [amdgpu]
> [ 104.642605] amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu]
> [ 104.642835] amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
> [ 104.643066] amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu]
> [ 104.643403] amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu]
> [ 104.643715] gpu_recover_get+0x2e/0x60 [amdgpu]
> [ 104.643951] simple_attr_read+0x6d/0x110
> [ 104.643960] debugfs_attr_read+0x49/0x70
> [ 104.643970] full_proxy_read+0x5f/0x90
> [ 104.643979] vfs_read+0xa3/0x190
> [ 104.643986] ksys_read+0x70/0xf0
> [ 104.643992] __x64_sys_read+0x1a/0x20
> [ 104.643999] do_syscall_64+0x38/0x90
> [ 104.644007] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 104.644017]
> other info that might help us debug this:
>
> [ 104.644026] Chain exists of:
> (work_completion)(&lh->work) -->
> &aconnector->hpd_lock --> &adev->dm.dc_lock
>
> [ 104.644043] Possible unsafe locking scenario:
>
> [ 104.644049] CPU0 CPU1
> [ 104.644055] ---- ----
> [ 104.644060] lock(&adev->dm.dc_lock);
> [ 104.644066] lock(&aconnector->hpd_lock);
> [ 104.644075] lock(&adev->dm.dc_lock);
> [ 104.644083] lock((work_completion)(&lh->work));
> [ 104.644090]
> *** DEADLOCK ***
>
> [ 104.644096] 3 locks held by cat/1158:
> [ 104.644103] #0: ffff88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at:
> simple_attr_read+0x4e/0x110 [ 104.644119] #1: ffff88810a0a1600
> (&adev->reset_sem){++++}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94
> [amdgpu] [ 104.644489] #2: ffff88810a09cc70
> (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb2/0x1d0 [amdgpu]
>
> Signed-off-by: Lang Yu <Lang.Yu at amd.com>
> ---
> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index e176ea84d75b..8727488df769 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -2657,13 +2657,15 @@ static void handle_hpd_rx_irq(void *param)
> }
> }
>
> - mutex_lock(&adev->dm.dc_lock);
> + if (!amdgpu_in_reset(adev))
> + mutex_lock(&adev->dm.dc_lock);
> #ifdef CONFIG_DRM_AMD_DC_HDCP
> result = dc_link_handle_hpd_rx_irq(dc_link, &hpd_irq_data, NULL);
> #else
> result = dc_link_handle_hpd_rx_irq(dc_link, NULL, NULL);
> #endif
> - mutex_unlock(&adev->dm.dc_lock);
> + if (!amdgpu_in_reset(adev))
> + mutex_unlock(&adev->dm.dc_lock);
Why is it ok to stop locking if in GPU reset ?
Andrey
If stopping locking in GPU reset, GPU reset works, though it's not safe.
If locking in GPU reset, deadlock occurred, GPU reset never works.
Display guys should give a perfect fix. It has been blocked for a long time,
it's time to take some actions.
Regards,
Lang
>
> out:
> if (result && !is_mst_root_connector) {
>
More information about the amd-gfx
mailing list