[PATCH] drm/amdgpu: harden the HW access lockdep check
Christian König
ckoenig.leichtzumerken at gmail.com
Fri Jul 19 16:05:35 UTC 2024
While Alex already fixed a bunch of them we still have tons of call
paths which are accessing the hw without holding the reset lock to
prevent concurrent GPU resets.
Start pointing those out so that we can eventually fix them. Only
point out the first misbehavior per driver load so that we won't
overflow the logs with them.
Signed-off-by: Christian König <christian.koenig at amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 +++++++---------------
1 file changed, 9 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bcacf2e35eba..30d83ae3c14a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -567,31 +567,19 @@ void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
*/
/* Check if hw access should be skipped because of hotplug or device error */
-bool amdgpu_device_skip_hw_access(struct amdgpu_device *adev)
+bool noinline amdgpu_device_skip_hw_access(struct amdgpu_device *adev)
{
- if (adev->no_hw_access)
- return true;
-
-#ifdef CONFIG_LOCKDEP
/*
- * This is a bit complicated to understand, so worth a comment. What we assert
- * here is that the GPU reset is not running on another thread in parallel.
- *
- * For this we trylock the read side of the reset semaphore, if that succeeds
- * we know that the reset is not running in paralell.
+ * HW access in process context requires that we hold the reset lock to
+ * make sure that no concurrent reset is running in paralell.
*
- * If the trylock fails we assert that we are either already holding the read
- * side of the lock or are the reset thread itself and hold the write side of
- * the lock.
+ * Interrupt context obviously can't hold a mutex, but the reset
+ * procedure is disabling interrupts as necessary.
*/
- if (in_task()) {
- if (down_read_trylock(&adev->reset_domain->sem))
- up_read(&adev->reset_domain->sem);
- else
- lockdep_assert_held(&adev->reset_domain->sem);
- }
-#endif
- return false;
+ if (in_task())
+ lockdep_assert_held_once(&adev->reset_domain->sem);
+
+ return adev->no_hw_access;
}
/**
--
2.34.1
More information about the amd-gfx
mailing list