[PATCH] drm/amdkfd: Avoid queue reset if disabled
Zhang, Hawking
Hawking.Zhang at amd.com
Mon Jun 30 07:52:11 UTC 2025
[AMD Official Use Only - AMD Internal Distribution Only]
Reviewed-by: Hawking Zhang <Hawking.Zhang at amd.com>
Regards,
Hawking
-----Original Message-----
From: Lazar, Lijo <Lijo.Lazar at amd.com>
Sent: Monday, June 30, 2025 12:41
To: amd-gfx at lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; Kim, Jonathan <Jonathan.Kim at amd.com>; Zhang, Jesse(Jie) <Jesse.Zhang at amd.com>
Subject: [PATCH] drm/amdkfd: Avoid queue reset if disabled
If ring reset is disabled, skip resetting queues. Instead, fall back to device based reset.
Signed-off-by: Lijo Lazar <lijo.lazar at amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 76359c6a3f3a..500f51552038 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -2339,9 +2339,18 @@ static int reset_hung_queues_sdma(struct device_queue_manager *dqm)
static int reset_queues_on_hws_hang(struct device_queue_manager *dqm, bool is_sdma) {
+ struct amdgpu_device *adev = dqm->dev->adev;
+
while (halt_if_hws_hang)
schedule();
+ if (adev->debug_disable_gpu_ring_reset) {
+ dev_info_once(adev->dev,
+ "%s queue hung, but ring reset disabled",
+ is_sdma ? "sdma" : "compute");
+
+ return -EPERM;
+ }
if (!amdgpu_gpu_recovery)
return -ENOTRECOVERABLE;
--
2.49.0
More information about the amd-gfx
mailing list