[PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier

Zhang, Hawking Hawking.Zhang at amd.com
Fri Nov 26 12:57:03 UTC 2021


[AMD Official Use Only]

Good catch. We still need to release ras object in the end. Any reason the sequence was removed?

@@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
 
 	WARN(con->features, "Feature mask is not cleared");
 
-	if (con->features)
-		amdgpu_ras_disable_all_features(adev, 1);
-
	cancel_delayed_work_sync(&con->ras_counte_delay_work);

Regards,
Hawking

-----Original Message-----
From: Stanley.Yang <Stanley.Yang at amd.com> 
Sent: Friday, November 26, 2021 17:48
To: amd-gfx at lists.freedesktop.org; Zhang, Hawking <Hawking.Zhang at amd.com>; Clements, John <John.Clements at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Li, Candice <Candice.Li at amd.com>; Chai, Thomas <YiPeng.Chai at amd.com>
Cc: Yang, Stanley <Stanley.Yang at amd.com>
Subject: [PATCH Review 1/1] drm/amdgpu: fix disable ras feature failed when unload drvier

Function amdgpu_device_fini_hw is called before amdgpu_device_fini_sw, so ras ta will unload before send ras disable command, ras dsiable operation must before hw fini.

Signed-off-by: Stanley.Yang <Stanley.Yang at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    | 4 ----
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 73ec46140d68..d5e642e90010 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2838,8 +2838,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
 	if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
 		amdgpu_virt_release_ras_err_handler_data(adev);
 
-	amdgpu_ras_pre_fini(adev);
-
 	if (adev->gmc.xgmi.num_physical_nodes > 1)
 		amdgpu_xgmi_remove_device(adev);
 
@@ -3959,6 +3957,9 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
 	amdgpu_fbdev_fini(adev);
 
+	/* disable ras feature must before hw fini */
+	amdgpu_ras_pre_fini(adev);
+
 	amdgpu_device_ip_fini_early(adev);
 
 	amdgpu_irq_fini_hw(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 39dfd4d59881..65102d2a0a98 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2484,7 +2484,6 @@ void amdgpu_ras_late_fini(struct amdgpu_device *adev,
 	amdgpu_ras_sysfs_remove(adev, ras_block);
 	if (ih_info->cb)
 		amdgpu_ras_interrupt_remove_handler(adev, ih_info);
-	amdgpu_ras_feature_enable(adev, ras_block, 0);
 }
 
 /* do some init work after IP late init as dependence.
@@ -2564,9 +2563,6 @@ int amdgpu_ras_fini(struct amdgpu_device *adev)
 
 	WARN(con->features, "Feature mask is not cleared");
 
-	if (con->features)
-		amdgpu_ras_disable_all_features(adev, 1);
-
 	cancel_delayed_work_sync(&con->ras_counte_delay_work);
 
 	amdgpu_ras_set_context(adev, NULL);
--
2.17.1


More information about the amd-gfx mailing list