[PATCH] drm/amdgpu: fix a potential circular locking dependency
Dennis Li
Dennis.Li at amd.com
Tue Aug 11 09:32:08 UTC 2020
[ 653.902305] ======================================================
[ 653.902928] WARNING: possible circular locking dependency detected
[ 653.903517] 5.6.0-deli-v5.6-2848-g3f3109b0e75f #1 Tainted: G OE
[ 653.904098] ------------------------------------------------------
[ 653.904675] amdgpu_test/3975 is trying to acquire lock:
[ 653.905241] ffff97848f8647a0 (&adev->reset_sem){.+.+}, at: amdgpu_gem_va_ioctl+0x286/0x4f0 [amdgpu]
[ 653.905953]
but task is already holding lock:
[ 653.907087] ffff9744adbee1f8 (reservation_ww_class_mutex){+.+.}, at: ttm_eu_reserve_buffers+0x1ae/0x520 [ttm]
[ 653.907694]
which lock already depends on the new lock.
[ 653.909423]
the existing dependency chain (in reverse order) is:
[ 653.910594]
-> #1 (reservation_ww_class_mutex){+.+.}:
[ 653.911759] __ww_mutex_lock.constprop.15+0xca/0x1120
[ 653.912350] ww_mutex_lock+0x73/0x80
[ 653.913044] amdgpu_amdkfd_alloc_gtt_mem+0xde/0x380 [amdgpu]
[ 653.913724] kgd2kfd_device_init+0x13f/0x5e0 [amdgpu]
[ 653.914388] amdgpu_amdkfd_device_init+0x155/0x190 [amdgpu]
[ 653.915033] amdgpu_device_init+0x1303/0x1e10 [amdgpu]
[ 653.915685] amdgpu_driver_load_kms+0x5c/0x2c0 [amdgpu]
[ 653.916349] amdgpu_pci_probe+0x11d/0x200 [amdgpu]
[ 653.916959] local_pci_probe+0x47/0xa0
[ 653.917570] work_for_cpu_fn+0x1a/0x30
[ 653.918184] process_one_work+0x29e/0x630
[ 653.918803] worker_thread+0x22b/0x3f0
[ 653.919427] kthread+0x12f/0x150
[ 653.920047] ret_from_fork+0x3a/0x50
[ 653.920661]
-> #0 (&adev->reset_sem){.+.+}:
[ 653.921893] __lock_acquire+0x13ec/0x16e0
[ 653.922531] lock_acquire+0xb8/0x1c0
[ 653.923174] down_read+0x48/0x230
[ 653.923886] amdgpu_gem_va_ioctl+0x286/0x4f0 [amdgpu]
[ 653.924588] drm_ioctl_kernel+0xb6/0x100 [drm]
[ 653.925283] drm_ioctl+0x389/0x450 [drm]
[ 653.926013] amdgpu_drm_ioctl+0x4f/0x80 [amdgpu]
[ 653.926686] ksys_ioctl+0x98/0xb0
[ 653.927357] __x64_sys_ioctl+0x1a/0x20
[ 653.928030] do_syscall_64+0x5f/0x250
[ 653.928697] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 653.929373]
other info that might help us debug this:
[ 653.931356] Possible unsafe locking scenario:
[ 653.932647] CPU0 CPU1
[ 653.933287] ---- ----
[ 653.933911] lock(reservation_ww_class_mutex);
[ 653.934530] lock(&adev->reset_sem);
[ 653.935154] lock(reservation_ww_class_mutex);
[ 653.935766] lock(&adev->reset_sem);
[ 653.936360]
*** DEADLOCK ***
[ 653.938028] 2 locks held by amdgpu_test/3975:
[ 653.938574] #0: ffffb2a862d6bcd0 (reservation_ww_class_acquire){+.+.}, at: amdgpu_gem_va_ioctl+0x39b/0x4f0 [amdgpu]
[ 653.939233] #1: ffff9744adbee1f8 (reservation_ww_class_mutex){+.+.}, at: ttm_eu_reserve_buffers+0x1ae/0x520 [ttm]
change the order of reservation_ww_class_mutex and adev->reset_sem in
amdgpu_gem_va_ioctl the same as ones in amdgpu_amdkfd_alloc_gtt_mem, to
avoid potential dead lock.
Signed-off-by: Dennis Li <Dennis.Li at amd.com>
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index ee1e8fff83b2..fc889c477696 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -652,6 +652,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
abo = NULL;
}
+ down_read(&adev->reset_sem);
+
amdgpu_vm_get_pd_bo(&fpriv->vm, &list, &vm_pd);
r = ttm_eu_reserve_buffers(&ticket, &list, true, &duplicates);
@@ -670,8 +672,6 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
bo_va = NULL;
}
- down_read(&adev->reset_sem);
-
switch (args->operation) {
case AMDGPU_VA_OP_MAP:
va_flags = amdgpu_gem_va_map_flags(adev, args->flags);
@@ -701,12 +701,11 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
args->operation);
- up_read(&adev->reset_sem);
-
error_backoff:
ttm_eu_backoff_reservation(&ticket, &list);
error_unref:
+ up_read(&adev->reset_sem);
drm_gem_object_put_unlocked(gobj);
return r;
}
--
2.17.1
More information about the amd-gfx
mailing list