[PATCH 2/2] drm/amdgpu: Fix SDMA queue reset array out-of-bounds access

Jesse Zhang jesse.zhang at amd.com
Wed Jun 11 05:56:04 UTC 2025


The current SDMA v4.4.2 queue reset logic incorrectly uses GET_INST
macro for queue operations, leading to array index out-of-bounds
errors when harvesting is enabled. This manifests as UBSAN warnings
when stopping queues during reset operations.

[  306.871518] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c:118:38
[  306.871538] index 4294967295 is out of range for type 'uint32_t *[44]'
[  306.871929]  amdgpu_sdma_reset_engine+0xe4/0x320 [amdgpu]
[  306.872115]  reset_queues_on_hws_hang+0x2dc/0x4d0 [amdgpu]

The fix ensures we use physical instance IDs consistently for queue
operations while maintaining harvest-aware mapping for register access.

Signed-off-by: Jesse Zhang <Jesse.Zhang at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index 9c169112a5e7..3de125062ee3 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -1670,7 +1670,7 @@ static bool sdma_v4_4_2_page_ring_is_guilty(struct amdgpu_ring *ring)
 static int sdma_v4_4_2_reset_queue(struct amdgpu_ring *ring, unsigned int vmid)
 {
 	struct amdgpu_device *adev = ring->adev;
-	u32 id = GET_INST(SDMA0, ring->me);
+	u32 id = ring->me;
 	int r;
 
 	if (!(adev->sdma.supported_reset & AMDGPU_RESET_TYPE_PER_QUEUE))
@@ -1686,7 +1686,7 @@ static int sdma_v4_4_2_reset_queue(struct amdgpu_ring *ring, unsigned int vmid)
 static int sdma_v4_4_2_stop_queue(struct amdgpu_ring *ring)
 {
 	struct amdgpu_device *adev = ring->adev;
-	u32 instance_id = GET_INST(SDMA0, ring->me);
+	u32 instance_id = ring->me;
 	u32 inst_mask;
 	uint64_t rptr;
 
-- 
2.34.1



More information about the amd-gfx mailing list