[PATCH v2] drm/amdgpu: annotate a false positive recursive locking

Zhang, Hawking Hawking.Zhang at amd.com
Mon Aug 10 05:14:32 UTC 2020


[AMD Public Use]

Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>

Regards,
Hawking
-----Original Message-----
From: Dennis Li <Dennis.Li at amd.com> 
Sent: Friday, August 7, 2020 15:53
To: amd-gfx at lists.freedesktop.org; Deucher, Alexander <Alexander.Deucher at amd.com>; Kuehling, Felix <Felix.Kuehling at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>
Cc: Li, Dennis <Dennis.Li at amd.com>
Subject: [PATCH v2] drm/amdgpu: annotate a false positive recursive locking

[  584.110304] ============================================
[  584.110590] WARNING: possible recursive locking detected
[  584.110876] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G           OE
[  584.111164] --------------------------------------------
[  584.111456] kworker/38:1/553 is trying to acquire lock:
[  584.111721] ffff9b15ff0a47a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.112112]
               but task is already holding lock:
[  584.112673] ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.113068]
               other info that might help us debug this:
[  584.113689]  Possible unsafe locking scenario:

[  584.114350]        CPU0
[  584.114685]        ----
[  584.115014]   lock(&adev->reset_sem);
[  584.115349]   lock(&adev->reset_sem);
[  584.115678]
                *** DEADLOCK ***

[  584.116624]  May be due to missing lock nesting notation

[  584.117284] 4 locks held by kworker/38:1/553:
[  584.117616]  #0: ffff9ad635c1d348 ((wq_completion)events){+.+.}, at: process_one_work+0x21f/0x630 [  584.117967]  #1: ffffac708e1c3e58 ((work_completion)(&con->recovery_work)){+.+.}, at: process_one_work+0x21f/0x630 [  584.118358]  #2: ffffffffc1c2a5d0 (&tmp->hive_lock){+.+.}, at: amdgpu_device_gpu_recover+0xae/0x1030 [amdgpu] [  584.118786]  #3: ffff9b1603d247a0 (&adev->reset_sem){++++}, at: amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.119222]
               stack backtrace:
[  584.119990] CPU: 38 PID: 553 Comm: kworker/38:1 Kdump: loaded Tainted: G           OE     5.6.0-deli-v5.6-2848-g3f3109b0e75f #1
[  584.120782] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [  584.121223] Workqueue: events amdgpu_ras_do_recovery [amdgpu] [  584.121638] Call Trace:
[  584.122050]  dump_stack+0x98/0xd5
[  584.122499]  __lock_acquire+0x1139/0x16e0 [  584.122931]  ? trace_hardirqs_on+0x3b/0xf0 [  584.123358]  ? cancel_delayed_work+0xa6/0xc0 [  584.123771]  lock_acquire+0xb8/0x1c0 [  584.124197]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.124599]  down_write+0x49/0x120 [  584.125032]  ? amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.125472]  amdgpu_device_gpu_recover+0x262/0x1030 [amdgpu] [  584.125910]  ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu] [  584.126367]  amdgpu_ras_do_recovery+0x159/0x190 [amdgpu] [  584.126789]  process_one_work+0x29e/0x630 [  584.127208]  worker_thread+0x3c/0x3f0 [  584.127621]  ? __kthread_parkme+0x61/0x90 [  584.128014]  kthread+0x12f/0x150 [  584.128402]  ? process_one_work+0x630/0x630 [  584.128790]  ? kthread_park+0x90/0x90 [  584.129174]  ret_from_fork+0x3a/0x50

Each adev has owned lock_class_key to avoid false positive recursive locking.

v2:
1. register adev->lock_key into lockdep, otherwise lockdep will report the below warning

[ 1216.705820] BUG: key ffff890183b647d0 has not been registered!
[ 1216.705924] ------------[ cut here ]------------ [ 1216.705972] DEBUG_LOCKS_WARN_ON(1) [ 1216.705997] WARNING: CPU: 20 PID: 541 at kernel/locking/lockdep.c:3743 lockdep_init_map+0x150/0x210

Signed-off-by: Dennis Li <Dennis.Li at amd.com>
Change-Id: I7571efeccbf15483982031d00504a353031a854a

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index e97c088d03b3..766dc8f8c8a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -967,6 +967,7 @@ struct amdgpu_device {
 	atomic_t                        in_gpu_reset;
 	enum pp_mp1_state               mp1_state;
 	struct rw_semaphore	reset_sem;
+	struct lock_class_key lock_key;
 	struct amdgpu_doorbell_index doorbell_index;
 
 	struct mutex			notifier_lock;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 62ecac97fbd2..ae0a576f9895 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3037,6 +3037,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	mutex_init(&adev->virt.vf_errors.lock);
 	hash_init(adev->mn_hash);
 	init_rwsem(&adev->reset_sem);
+	lockdep_register_key(&adev->lock_key);
+	lockdep_set_class(&adev->reset_sem, &adev->lock_key);
 	atomic_set(&adev->in_gpu_reset, 0);
 	mutex_init(&adev->psp.mutex);
 	mutex_init(&adev->notifier_lock);
@@ -3411,6 +3413,8 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
 		amdgpu_pmu_fini(adev);
 	if (adev->discovery_bin)
 		amdgpu_discovery_fini(adev);
+
+	lockdep_unregister_key(&adev->lock_key);
 }
 
 
--
2.17.1


More information about the amd-gfx mailing list