[PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure

Shi, Leslie Yuliang.Shi at amd.com
Thu Dec 16 07:00:24 UTC 2021


[Public]

Hi Guchun,

As Andrey says, "we should not call amdgpu_device_unmap_mmio unless device is unplugged", I think we should call amdgpu_device_unmap_mmio() only if device is unplugged (drm_dev_enter() return false) . 

+if (!drm_dev_enter(adev_to_drm(adev), &idx))
+	amdgpu_device_unmap_mmio(adev);
+ else 
# 	drm_dev_exit(idx);


Regards,
Leslie

-----Original Message-----
From: Chen, Guchun <Guchun.Chen at amd.com> 
Sent: Thursday, December 16, 2021 2:46 PM
To: Shi, Leslie <Yuliang.Shi at amd.com>; Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
Subject: RE: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure

[Public]

Hi Leslie,

I think we need to modify it like:

+if (drm_dev_enter(adev_to_drm(adev), &idx)) {
+	amdgpu_device_unmap_mmio(adev);
+	drm_dev_exit(idx);
+}

Also you need to credit Andrey a 'suggested-by' in your patch.

Regards,
Guchun

-----Original Message-----
From: Shi, Leslie <Yuliang.Shi at amd.com> 
Sent: Thursday, December 16, 2021 2:14 PM
To: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Pan, Xinhui <Xinhui.Pan at amd.com>; Deucher, Alexander <Alexander.Deucher at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Chen, Guchun <Guchun.Chen at amd.com>; Shi, Leslie <Yuliang.Shi at amd.com>
Subject: [PATCH v2] drm/amdgpu: Call amdgpu_device_unmap_mmio() iff device is unplugged to prevent crash in GPU initialization failure

[Why]
In amdgpu_driver_load_kms, when amdgpu_device_init returns error during driver modprobe, it will start the error handle path immediately and call into amdgpu_device_unmap_mmio as well to release mapped VRAM. However, in the following release callback, driver stills visits the unmapped memory like vcn.inst[i].fw_shared_cpu_addr in vcn_v3_0_sw_fini. So a kernel crash occurs.

[How]
call amdgpu_device_unmap_mmio() iff device is unplugged to prevent invalid memory address in
vcn_v3_0_sw_fini() when GPU initialization failure.

Signed-off-by: Leslie Shi <Yuliang.Shi at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index fb03d75880ec..d3656e7b60c2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3845,6 +3845,8 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
  */
 void amdgpu_device_fini_hw(struct amdgpu_device *adev)  {
+	int idx;
+
 	dev_info(adev->dev, "amdgpu: finishing device.\n");
 	flush_delayed_work(&adev->delayed_init_work);
 	if (adev->mman.initialized) {
@@ -3888,7 +3890,11 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
 	amdgpu_gart_dummy_page_fini(adev);
 
-	amdgpu_device_unmap_mmio(adev);
+	if (!drm_dev_enter(adev_to_drm(adev), &idx))
+		amdgpu_device_unmap_mmio(adev);
+	else
+		drm_dev_exit(idx);
+
 }
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
--
2.25.1


More information about the amd-gfx mailing list