[PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini

Zhang, Hawking Hawking.Zhang at amd.com
Mon Apr 24 11:28:26 UTC 2023


[AMD Official Use Only - General]

Let's handle umc and gfx ras cases in separated patch.

gmc.ecc_irq maps to df mca interrupt, which is enabled by firmware per IFWI setting. Check GFX RAS mask is not correct in such case. host driver is not privileged to enable/disable the interrupt. As you noticed already the set function is dummy one. then just drop amdgpu_irq_put call for gmc.ecc_irq should be fine.

gfx.cp_ecc_error_irq is not used in gfx11. Feel free to retire the interrupt for gfx11 in another patch (i.e., gfx_v11_0_cp_ecc_error_irq_funcs is not needed any more)

Regards,
Hawking

-----Original Message-----
From: Horatio Zhang <Hongkun.Zhang at amd.com>
Sent: Monday, April 24, 2023 18:37
To: amd-gfx at lists.freedesktop.org
Cc: Chen, Guchun <Guchun.Chen at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>; Yao, Longlong <Longlong.Yao at amd.com>; Zhang, Horatio <Hongkun.Zhang at amd.com>
Subject: [PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini

The call trace occurred when the amdgpu is suspended before the mode1 reset. For the IP block that do not support ras features, the relevant interrupt is not opened during initialization, but hw_fini forced the close of this interrupt, which resulted in amdgpu_irq_enabled check warning.

[  102.873958] Call Trace:
[  102.873959]  <TASK>
[  102.873961]  gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [  102.874019]  gfx_v11_0_suspend+0xe/0x20 [amdgpu] [  102.874072]  amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [  102.874122]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [  102.874172]  amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [  102.874223]  amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [  102.874321]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [  102.874375]  process_one_work+0x21f/0x3f0 [  102.874377]  worker_thread+0x200/0x3e0 [  102.874378]  ? process_one_work+0x3f0/0x3f0 [  102.874379]  kthread+0xfd/0x130 [  102.874380]  ? kthread_complete_and_exit+0x20/0x20
[  102.874381]  ret_from_fork+0x22/0x30

[  102.980303] Call Trace:
[  102.980303]  <TASK>
[  102.980304]  gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [  102.980357]  gmc_v11_0_suspend+0xe/0x20 [amdgpu] [  102.980409]  amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [  102.980459]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [  102.980520]  amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [  102.980573]  amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [  102.980687]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [  102.980740]  process_one_work+0x21f/0x3f0 [  102.980741]  worker_thread+0x200/0x3e0 [  102.980742]  ? process_one_work+0x3f0/0x3f0 [  102.980743]  kthread+0xfd/0x130 [  102.980743]  ? kthread_complete_and_exit+0x20/0x20
[  102.980744]  ret_from_fork+0x22/0x30

Signed-off-by: Horatio Zhang <Hongkun.Zhang at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++-  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 543af07ff102..0f6b037558bc 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -4483,7 +4483,8 @@ static int gfx_v11_0_hw_fini(void *handle)
        struct amdgpu_device *adev = (struct amdgpu_device *)handle;
        int r;

-       amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
+       if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
+               amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
        amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
        amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 3828ca95899f..25f466c26d18 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -951,7 +951,8 @@ static int gmc_v11_0_hw_fini(void *handle)
                return 0;
        }

-       amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
+       if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
+               amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
        amdgpu_irq_put(adev, &adev->gmc.vm_fault, 0);
        gmc_v11_0_gart_disable(adev);

--
2.34.1



More information about the amd-gfx mailing list