[PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini

Zhang, Horatio Hongkun.Zhang at amd.com
Wed May 10 10:54:32 UTC 2023


[AMD Official Use Only - General]

Hi Hawking,

When modprobe, the interrupt of jpeg/vcn was enabled in amdgpu_fence_driver_hw_init(). If the amdgpu_irq_get function is added in amdgpu_xxx_ras_late_init/xxx_v4_0_late_init, it will enable the instance interrupt twice. 
My previous modification plan also had this issue. Perhaps we should remove the amdgpu_irq_put function from jpeg/vcn_v4_0_hw_fini.

Regards,
Horatio

-----Original Message-----
From: Zhang, Hawking <Hawking.Zhang at amd.com> 
Sent: Monday, May 8, 2023 8:32 PM
To: Zhou1, Tao <Tao.Zhou1 at amd.com>; Zhang, Horatio <Hongkun.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Xu, Feifei <Feifei.Xu at amd.com>; Liu, Leo <Leo.Liu at amd.com>; Jiang, Sonny <Sonny.Jiang at amd.com>; Limonciello, Mario <Mario.Limonciello at amd.com>; Liu, HaoPing (Alan) <HaoPing.Liu at amd.com>; Zhang, Horatio <Hongkun.Zhang at amd.com>
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

Shall we consider creating amdgpu_vcn_ras_late_init as a common helper for interrupt enablement, like other IP blocks. This also reduces further effort when RAS feature is introduced in new version of vcn/jpeg

Regards,
Hawking

-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1 at amd.com>
Sent: Monday, May 8, 2023 19:06
To: Zhang, Horatio <Hongkun.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Liu, Leo <Leo.Liu at amd.com>; Jiang, Sonny <Sonny.Jiang at amd.com>; Limonciello, Mario <Mario.Limonciello at amd.com>; Liu, HaoPing (Alan) <HaoPing.Liu at amd.com>; Zhang, Horatio <Hongkun.Zhang at amd.com>
Subject: RE: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in jpeg_v4_0_hw_fini

[AMD Official Use Only - General]

The series is:

Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>

> -----Original Message-----
> From: Horatio Zhang <Hongkun.Zhang at amd.com>
> Sent: Monday, May 8, 2023 6:20 PM
> To: amd-gfx at lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Zhou1, Tao 
> <Tao.Zhou1 at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Liu, Leo 
> <Leo.Liu at amd.com>; Jiang, Sonny <Sonny.Jiang at amd.com>; Limonciello, 
> Mario <Mario.Limonciello at amd.com>; Liu, HaoPing (Alan) 
> <HaoPing.Liu at amd.com>; Zhang, Horatio <Hongkun.Zhang at amd.com>
> Subject: [PATCH 1/2] drm/amdgpu: fix amdgpu_irq_put call trace in 
> jpeg_v4_0_hw_fini
> 
> During the suspend, the jpeg_v4_0_hw_init function will use the 
> amdgpu_irq_put to disable the irq of jpeg.inst, but it was not enabled 
> during the resume process, which resulted in a call trace during the GPU reset process.
> 
> [   50.497562] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu]
> [   50.497619] RSP: 0018:ffffaa2400fcfcb0 EFLAGS: 00010246
> [   50.497620] RAX: 0000000000000000 RBX: 0000000000000001 RCX:
> 0000000000000000
> [   50.497621] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000000000
> [   50.497621] RBP: ffffaa2400fcfcd0 R08: 0000000000000000 R09:
> 0000000000000000
> [   50.497622] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff99b2105242d8
> [   50.497622] R13: 0000000000000000 R14: ffff99b210500000 R15:
> ffff99b210500000
> [   50.497623] FS:  0000000000000000(0000) GS:ffff99b518480000(0000)
> knlGS:0000000000000000
> [   50.497623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   50.497624] CR2: 00007f9d32aa91e8 CR3: 00000001ba210000 CR4:
> 0000000000750ee0
> [   50.497624] PKRU: 55555554
> [   50.497625] Call Trace:
> [   50.497625]  <TASK>
> [   50.497627]  jpeg_v4_0_hw_fini+0x43/0xc0 [amdgpu]
> [   50.497693]  jpeg_v4_0_suspend+0x13/0x30 [amdgpu]
> [   50.497751]  amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu]
> [   50.497802]  amdgpu_device_ip_suspend+0x41/0x80 [amdgpu]
> [   50.497854]  amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu]
> [   50.497905]  amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu]
> [   50.498005]  amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu]
> [   50.498060]  process_one_work+0x21f/0x400
> [   50.498063]  worker_thread+0x200/0x3f0
> [   50.498064]  ? process_one_work+0x400/0x400
> [   50.498065]  kthread+0xee/0x120
> [   50.498067]  ? kthread_complete_and_exit+0x20/0x20
> [   50.498068]  ret_from_fork+0x22/0x30
> 
> Fixes: 86e8255f941e ("drm/amdgpu: add JPEG 4.0 RAS poison consumption
> handling")
> Signed-off-by: Horatio Zhang <Hongkun.Zhang at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> index 77e1e64aa1d1..b5c14a166063 100644
> --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
> @@ -66,6 +66,13 @@ static int jpeg_v4_0_early_init(void *handle)
>  	return 0;
>  }
> 
> +static int jpeg_v4_0_late_init(void *handle) {
> +	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> +
> +	return amdgpu_irq_get(adev, &adev->jpeg.inst->irq, 0); }
> +
>  /**
>   * jpeg_v4_0_sw_init - sw init for JPEG block
>   *
> @@ -696,7 +703,7 @@ static int jpeg_v4_0_process_interrupt(struct
> amdgpu_device *adev,  static const struct amd_ip_funcs jpeg_v4_0_ip_funcs = {
>  	.name = "jpeg_v4_0",
>  	.early_init = jpeg_v4_0_early_init,
> -	.late_init = NULL,
> +	.late_init = jpeg_v4_0_late_init,
>  	.sw_init = jpeg_v4_0_sw_init,
>  	.sw_fini = jpeg_v4_0_sw_fini,
>  	.hw_init = jpeg_v4_0_hw_init,
> --
> 2.34.1


More information about the amd-gfx mailing list