[PATCH] drm/amdgpu: fix amdgpu_irq_enabled warning in gfx and gmc hw_fini

Mon Apr 24 10:50:01 UTC 2023

Am 24.04.23 um 12:37 schrieb Horatio Zhang:
> The call trace occurred when the amdgpu is suspended
> before the mode1 reset. For the IP block that do not
> support ras features, the relevant interrupt is not
> opened during initialization, but hw_fini forced the
> close of this interrupt, which resulted in amdgpu_irq_enabled
> check warning.2

Sounds like it was a good idea to add this check.

>
> [  102.873958] Call Trace:
> [  102.873959]  <TASK>
> [  102.873961]  gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu]
> [  102.874019]  gfx_v11_0_suspend+0xe/0x20 [amdgpu]
> [  102.874072]  amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu]
> [  102.874122]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu]
> [  102.874172]  amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu]
> [  102.874223]  amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu]
> [  102.874321]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu]
> [  102.874375]  process_one_work+0x21f/0x3f0
> [  102.874377]  worker_thread+0x200/0x3e0
> [  102.874378]  ? process_one_work+0x3f0/0x3f0
> [  102.874379]  kthread+0xfd/0x130
> [  102.874380]  ? kthread_complete_and_exit+0x20/0x20
> [  102.874381]  ret_from_fork+0x22/0x30
>
> [  102.980303] Call Trace:
> [  102.980303]  <TASK>
> [  102.980304]  gmc_v11_0_hw_fini+0x54/0x90 [amdgpu]
> [  102.980357]  gmc_v11_0_suspend+0xe/0x20 [amdgpu]
> [  102.980409]  amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu]
> [  102.980459]  amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu]
> [  102.980520]  amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu]
> [  102.980573]  amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu]
> [  102.980687]  amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu]
> [  102.980740]  process_one_work+0x21f/0x3f0
> [  102.980741]  worker_thread+0x200/0x3e0
> [  102.980742]  ? process_one_work+0x3f0/0x3f0
> [  102.980743]  kthread+0xfd/0x130
> [  102.980743]  ? kthread_complete_and_exit+0x20/0x20
> [  102.980744]  ret_from_fork+0x22/0x30
>
> Signed-off-by: Horatio Zhang <Hongkun.Zhang at amd.com>

Assuming the corresponding _get() calls are already protected by the 
same check: Reviewed-by: Christian König <christian.koenig at amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++-
>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 ++-
>   2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> index 543af07ff102..0f6b037558bc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> @@ -4483,7 +4483,8 @@ static int gfx_v11_0_hw_fini(void *handle)
>   	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>   	int r;
>   
> -	amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
> +	if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> +		amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
>   	amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
>   	amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> index 3828ca95899f..25f466c26d18 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> @@ -951,7 +951,8 @@ static int gmc_v11_0_hw_fini(void *handle)
>   		return 0;
>   	}
>   
> -	amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
> +	if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> +		amdgpu_irq_put(adev, &adev->gmc.ecc_irq, 0);
>   	amdgpu_irq_put(adev, &adev->gmc.vm_fault, 0);
>   	gmc_v11_0_gart_disable(adev);
>