[PATCH V3 00/19] Reset improvements for GC10+
Alex Deucher
alexander.deucher at amd.com
Wed May 28 04:18:55 UTC 2025
This set improves per queue reset support for GC10+.
When we reset the queue, the queue is lost so we need
to re-emit the unprocessed state from subsequent submissions.
To that end, in order to make sure we actually restore
unprocessed state, we need to enable legacy enforce isolation
so that we can safely re-emit the unprocessed state. If
we don't multiple jobs can run in parallel and we may not
end up resetting the correct one. This is similar to how
windows handles queues. This also gives us correct guilty
tracking for GC.
Tested on GC 10 and 11 chips with a game running and
then running hang tests. The game pauses when the
hang happens, then continues after the queue reset.
I tried this same approach and GC8 and 9, but it
was not as reliable as soft recovery. As such, I've dropped
the KGQ reset code for pre-GC10.
The same approach can be extended to SDMA and VCN in the future.
They don't need enforce isolation because those engines
are single threaded so they always operate serially.
Alex Deucher (18):
drm/amdgpu/gfx10: enable legacy enforce isolation
drm/amdgpu/gfx11: enable legacy enforce isolation
drm/amdgpu/gfx12: enable legacy enforce isolation
drm/amdgpu/gfx7: drop reset_kgq
drm/amdgpu/gfx8: drop reset_kgq
drm/amdgpu/gfx9: drop reset_kgq
drm/amdgpu: add AMDGPU_QUEUE_RESET_TIMEOUT
drm/amdgpu/ring: add helper for padding the ring
drm/amdgpu: pad ring in amdgpu_ib_schedule
drm/amdgpu: track ring state associated with a job
drm/amdgpu/gfx10: re-emit unprocessed state on kgq reset
drm/amdgpu/gfx11: re-emit unprocessed state on kgq reset
drm/amdgpu/gfx12: re-emit unprocessed state on kgq reset
drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset
drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset
drm/amdgpu/gfx10: re-emit unprocessed state on kcq reset
drm/amdgpu/gfx11: re-emit unprocessed state on kcq reset
drm/amdgpu/gfx12: re-emit unprocessed state on kcq reset
Christian König (1):
drm/amdgpu: rework queue reset scheduler interaction
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 8 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 32 ++++++-----
drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 52 ++++++++++++++---
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 6 ++
drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 57 ++++++++++---------
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 48 +++++++++-------
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 48 +++++++++-------
drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 71 ------------------------
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 71 ------------------------
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 59 ++++----------------
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 15 ++++-
13 files changed, 192 insertions(+), 278 deletions(-)
--
2.49.0
More information about the amd-gfx
mailing list