[PATCH AUTOSEL 6.12 18/33] drm/amdkfd: debugfs hang_hws skip GPU with MES

Sasha Levin sashal at kernel.org
Thu Apr 3 19:16:41 UTC 2025


From: Philip Yang <Philip.Yang at amd.com>

[ Upstream commit fe9d0061c413f8fb8c529b18b592b04170850ded ]

debugfs hang_hws is used by GPU reset test with HWS, for MES this crash
the kernel with NULL pointer access because dqm->packet_mgr is not setup
for MES path.

Skip GPU with MES for now, MES hang_hws debugfs interface will be
supported later.

Signed-off-by: Philip Yang <Philip.Yang at amd.com>
Reviewed-by: Kent Russell <kent.russell at amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling at amd.com>
Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
Signed-off-by: Sasha Levin <sashal at kernel.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index d350c7ce35b3d..9186ef0bd2a32 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -1493,6 +1493,11 @@ int kfd_debugfs_hang_hws(struct kfd_node *dev)
 		return -EINVAL;
 	}
 
+	if (dev->kfd->shared_resources.enable_mes) {
+		dev_err(dev->adev->dev, "Inducing MES hang is not supported\n");
+		return -EINVAL;
+	}
+
 	return dqm_debugfs_hang_hws(dev->dqm);
 }
 
-- 
2.39.5



More information about the amd-gfx mailing list