[PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini

Tue Apr 25 06:32:18 UTC 2023

[AMD Official Use Only - General]

Hi Christian,

Thank you for such a quick review. 
It seems that the gfx_v11_0_cp_ecc_error_irq_funcs and amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0) in gmc_v11_0_hw_fini are not needed anymore.
I will update this fix in the next version of the patch.

Thanks,
Horatio

-----Original Message-----
From: Christian König <ckoenig.leichtzumerken at gmail.com> 
Sent: Monday, April 24, 2023 6:50 PM
To: Zhang, Horatio <Hongkun.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Yao, Longlong <Longlong.Yao at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Chen, Guchun <Guchun.Chen at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: Re: [PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini

Am 24.04.23 um 12:37 schrieb Horatio Zhang:
> The call trace occurred when the amdgpu is suspended before the mode1 
> reset. For the IP block that do not support ras features, the relevant 
> interrupt is not opened during initialization, but hw_fini forced the 
> close of this interrupt, which resulted in amdgpu_irq_enabled check 
> warning.2

Sounds like it was a good idea to add this check.

>
> [  102.873958] Call Trace:
> [  102.873959]  <TASK>
> [  102.873961]  gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [  102.874019]  
> gfx_v11_0_suspend+0xe/0x20 [amdgpu] [  102.874072]  
> amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [  102.874122]  
> amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [  102.874172]  
> amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [  102.874223]  
> amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [  102.874321]  
> amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [  102.874375]  
> process_one_work+0x21f/0x3f0 [  102.874377]  worker_thread+0x200/0x3e0 
> [  102.874378]  ? process_one_work+0x3f0/0x3f0 [  102.874379]  
> kthread+0xfd/0x130 [  102.874380]  ? 
> kthread_complete_and_exit+0x20/0x20
> [  102.874381]  ret_from_fork+0x22/0x30
>
> [  102.980303] Call Trace:
> [  102.980303]  <TASK>
> [  102.980304]  gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [  102.980357]  
> gmc_v11_0_suspend+0xe/0x20 [amdgpu] [  102.980409]  
> amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [  102.980459]  
> amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [  102.980520]  
> amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [  102.980573]  
> amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [  102.980687]  
> amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [  102.980740]  
> process_one_work+0x21f/0x3f0 [  102.980741]  worker_thread+0x200/0x3e0 
> [  102.980742]  ? process_one_work+0x3f0/0x3f0 [  102.980743]  
> kthread+0xfd/0x130 [  102.980743]  ? 
> kthread_complete_and_exit+0x20/0x20
> [  102.980744]  ret_from_fork+0x22/0x30
>
> Signed-off-by: Horatio Zhang <Hongkun.Zhang at amd.com>

Assuming the corresponding _get() calls are already protected by the same check: Reviewed-by: Christian König <christian.koenig at amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 ++-
>   2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 543af07ff102..0f6b037558bc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -4483,7 +4483,8 @@ static int gfx_v11_0_hw_fini(void *handle)
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>   	int r;
>   
> -	amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
> +	if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> +		amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
>   	amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
>   	amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> index 3828ca95899f..25f466c26d18 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> @@ -951,7 +951,8 @@ static int gmc_v11_0_hw_fini(void *handle)
>   		return 0;
>   	}
>   
> -	amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
> +	if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> +		amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
>   	amdgpu_irq_put(adev, &adev->gmc.vm_fault, 0);
>   	gmc_v11_0_gart_disable(adev);
>