[PATCH v2 0/4] Rework amdgpu HW fence refocunt and update scheduler parent fence refcount.
Andrey Grodzovsky
andrey.grodzovsky at amd.com
Fri Jun 24 18:09:51 UTC 2022
Yiqing raised a problem of negative fence refcount for resubmitted jobs
in amdgpu and suggested a workaround in [1]. I took a look myself and discovered
some deeper problems both in amdgpu and scheduler code.
Yiqing helped with testing the new code and also drew a detailed refcount and flow
tracing diagram for parent (HW) fence life cycle and refcount under various
cases for the proposed patchset at [2].
v2:
Update race preventionby fixing by swithing from amdgpu_irq_get/put to enable/desable_irq (Christian)
Drop refcount fix for amdgpu_job->external_hw_fence as it was causing underflow in direct submissions
TODO - Follow up cleanup to totally get rid of amdgpu_job->external_hw_fence
[1] - https://lore.kernel.org/all/731b7ff1-3cc9-e314-df2a-7c51b76d4db0@amd.com/t/#r00c728fcc069b1276642c325bfa9d82bf8fa21a3
[2] - https://drive.google.com/file/d/1yEoeW6OQC9WnwmzFW6NBLhFP_jD0xcHm/view?usp=sharing
Andrey Grodzovsky (4):
drm/amdgpu: Add put fence in amdgpu_fence_driver_clear_job_fences
drm/amdgpu: Prevent race between late signaled fences and GPU reset.
drm/sched: Partial revert of 'drm/sched: Keep s_fence->parent pointer'
drm/amdgpu: Follow up change to previous drm scheduler change.
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 +++++++++++++++++-----
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 29 ++++++++++++++++++--
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 4 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
drivers/gpu/drm/scheduler/sched_main.c | 13 ++++++---
6 files changed, 65 insertions(+), 15 deletions(-)
--
2.25.1
More information about the amd-gfx
mailing list