[PATCH] drm/amdgpu: fix a GPU hang issue when remove device

Zhang, Hawking Hawking.Zhang at amd.com
Wed Dec 30 12:31:31 UTC 2020


[AMD Public Use]

Hi Dennis,

Thanks for digging this out. 

I'd like to understand where the mmio register access is from prior to driver call amdgpu_device_set_pg_state to disable gfxoff in ip_fini phase. I think we already move ungate gfx pg in very early stage of device_fini. The only GC register access, ahead of disabling gfxoff,  I can think of is gfx eop interrupt disablement. 

The call stack show register write failure through kiq, but that approach should be safe even with gfxoff enabled.

Regards,
Hawking

-----Original Message-----
From: Dennis Li <Dennis.Li at amd.com> 
Sent: Wednesday, December 30, 2020 19:51
To: amd-gfx at lists.freedesktop.org; Chen, Jiansong (Simon) <Jiansong.Chen at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>
Cc: Li, Dennis <Dennis.Li at amd.com>
Subject: [PATCH] drm/amdgpu: fix a GPU hang issue when remove device

When GFXOFF is enabled and GPU is idle, driver will fail to access some registers. Therefore disable GFXOFF before unload device.

amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
amdgpu: cp queue pipe 4 queue 0 preemption failed amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2 amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 amdgpu 0000:03:00.0: amdgpu: failed to write reg 2890 wait reg 28a2 amdgpu 0000:03:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706

Signed-off-by: Dennis Li <Dennis.Li at amd.com>
Change-Id: I42431f5d0bf54909e1df888a0d72fc009d8e196c

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index e365c4fdcfe3..47d1291d5053 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -83,6 +83,8 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
 	if (adev == NULL)
 		return;
 
+	amdgpu_gfx_off_ctrl(adev, false);
+
 	amdgpu_unregister_gpu_instance(adev);
 
 	if (adev->rmmio == NULL)
--
2.17.1


More information about the amd-gfx mailing list