[PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini
Zhang, Horatio
Hongkun.Zhang at amd.com
Tue Apr 25 06:32:18 UTC 2023
[AMD Official Use Only - General]
Hi Christian,
Thank you for such a quick review.
It seems that the gfx_v11_0_cp_ecc_error_irq_funcs and amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0) in gmc_v11_0_hw_fini are not needed anymore.
I will update this fix in the next version of the patch.
Thanks,
Horatio
-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com>
Sent: Monday, April 24, 2023 6:50 PM
To: Zhang, Horatio <Hongkun.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Yao, Longlong <Longlong.Yao at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: Re: [PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini
Am 24.04.23 um 12:37 schrieb Horatio Zhang:
> The call trace occurred when the amdgpu is suspended before the mode1
> reset. For the IP block that do not support ras features, the relevant
> interrupt is not opened during initialization, but hw_fini forced the
> close of this interrupt, which resulted in amdgpu_irq_enabled check
> warning.2
Sounds like it was a good idea to add this check.
>
> [ 102.873958] Call Trace:
> [ 102.873959] <TASK>
> [ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [ 102.874019]
> gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.874072]
> amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.874122]
> amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.874172]
> amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.874223]
> amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.874321]
> amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.874375]
> process_one_work+0x21f/0x3f0 [ 102.874377] worker_thread+0x200/0x3e0
> [ 102.874378] ? process_one_work+0x3f0/0x3f0 [ 102.874379]
> kthread+0xfd/0x130 [ 102.874380] ?
> kthread_complete_and_exit+0x20/0x20
> [ 102.874381] ret_from_fork+0x22/0x30
>
> [ 102.980303] Call Trace:
> [ 102.980303] <TASK>
> [ 102.980304] gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [ 102.980357]
> gmc_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.980409]
> amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.980459]
> amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.980520]
> amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.980573]
> amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.980687]
> amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.980740]
> process_one_work+0x21f/0x3f0 [ 102.980741] worker_thread+0x200/0x3e0
> [ 102.980742] ? process_one_work+0x3f0/0x3f0 [ 102.980743]
> kthread+0xfd/0x130 [ 102.980743] ?
> kthread_complete_and_exit+0x20/0x20
> [ 102.980744] ret_from_fork+0x22/0x30
>
> Signed-off-by: Horatio Zhang <Hongkun.Zhang at amd.com>
Assuming the corresponding _get() calls are already protected by the same check: Reviewed-by: Christian König <christian.koenig at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++-
> drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 543af07ff102..0f6b037558bc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -4483,7 +4483,8 @@ static int gfx_v11_0_hw_fini(void *handle)
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> int r;
>
> - amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
> + if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> + amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
> amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
> amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> index 3828ca95899f..25f466c26d18 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> @@ -951,7 +951,8 @@ static int gmc_v11_0_hw_fini(void *handle)
> return 0;
> }
>
> - amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
> + if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> + amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
> amdgpu_irq_put(adev, &adev->gmc.vm_fault, 0);
> gmc_v11_0_gart_disable(adev);
>
More information about the amd-gfx
mailing list