[PATCH] drm/amdgpu: fix a GPU hang issue when remove device
Zhang, Hawking
Hawking.Zhang at amd.com
Wed Dec 30 12:31:31 UTC 2020
[AMD Public Use]
Hi Dennis,
Thanks for digging this out.
I'd like to understand where the mmio register access is from prior to driver call amdgpu_device_set_pg_state to disable gfxoff in ip_fini phase. I think we already move ungate gfx pg in very early stage of device_fini. The only GC register access, ahead of disabling gfxoff, I can think of is gfx eop interrupt disablement.
The call stack show register write failure through kiq, but that approach should be safe even with gfxoff enabled.
Regards,
Hawking
-----Original Message-----
From: Dennis Li <Dennis.Li at amd.com>
Sent: Wednesday, December 30, 2020 19:51
To: amd-gfx at lists.freedesktop.org; Chen, Jiansong (Simon) <Jiansong.Chen at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>
Cc: Li, Dennis <Dennis.Li at amd.com>
Subject: [PATCH] drm/amdgpu: fix a GPU hang issue when remove device
When GFXOFF is enabled and GPU is idle, driver will fail to access some registers. Therefore disable GFXOFF before unload device.
amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
amdgpu: cp queue pipe 4 queue 0 preemption failed amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2 amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2 amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
Signed-off-by: Dennis Li <Dennis.Li at amd.com>
Change-Id: I42431f5d0bf54909e1df888a0d72fc009d8e196c
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index e365c4fdcfe3..47d1291d5053 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -83,6 +83,8 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
if (adev == NULL)
return;
+ amdgpu_gfx_off_ctrl(adev, false);
+
amdgpu_unregister_gpu_instance(adev);
if (adev->rmmio == NULL)
--
2.17.1
More information about the amd-gfx
mailing list