[PATCH v4 0/9] drm/amdgpu: prevent concurrent GPU access during reset
Yunxiang Li
Yunxiang.Li at amd.com
Wed Jun 5 01:33:09 UTC 2024
If another thread accesses the gpu while the GPU is being reset, the
reset could fail. This is especially problematic on SRIOV since host
may reset the GPU even if guest is not yet ready.
There are code in place that tries to prevent stray access, but over
time bugs have crept in making it not reliable. This series hopes to
address these bugs.
v4: From testing, it seem that removing the flush from gart enable
sometimes causes the gart to not be flushed at all. So dropping
drm/amd/amdgpu: remove unnecessary flush when enable gart
and replace with this patch instead
drm/amdgpu: call flush_gpu_tlb directly in gfxhub enable
Splitting
drm/amdgpu: fix missing reset domain locks
into multiple commits
drm/amdgpu: add lock in amdgpu_gart_invalidate_tlb
drm/amdgpu: add lock in kfd_process_dequeue_from_device
v3: dropped:
drm/amdgpu: abort fence poll if reset is started
Revert "drm/amdgpu: Queue KFD reset workitem in VF FED"
updated:
drm/amdgpu: fix sriov host flr handler
drm/amdgpu: fix missing reset domain locks
Yunxiang Li (9):
drm/amdgpu: add skip_hw_access checks for sriov
drm/amdgpu: fix sriov host flr handler
drm/amdgpu/kfd: remove is_hws_hang and is_resetting
drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover
drm/amdgpu: use helper in amdgpu_gart_unbind
drm/amdgpu: call flush_gpu_tlb directly in gfxhub enable
drm/amdgpu: fix locking scope when flushing tlb
drm/amdgpu: add lock in amdgpu_gart_invalidate_tlb
drm/amdgpu: add lock in kfd_process_dequeue_from_device
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 11 +--
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 70 ++++++++--------
drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 2 -
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 23 ++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 +
drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 39 ++++-----
drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 39 ++++-----
drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 6 --
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 1 -
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 79 ++++++++-----------
.../drm/amd/amdkfd/kfd_device_queue_manager.h | 1 -
drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 11 ++-
.../gpu/drm/amd/amdkfd/kfd_packet_manager.c | 4 +-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 4 +-
.../amd/amdkfd/kfd_process_queue_manager.c | 13 ++-
18 files changed, 154 insertions(+), 157 deletions(-)
--
2.34.1
More information about the amd-gfx
mailing list