[PATCH 2/2] drm/amdgpu: fix amdgpu_irq_put call trace in vcn_v4_0_hw_fini
Zhang, Horatio
Hongkun.Zhang at amd.com
Tue May 9 02:18:37 UTC 2023
[AMD Official Use Only - General]
Hi Tao,
Sorry, I forgot to check, thank you for your suggestion. I will update this modification in the next version.
Thanks,
Horatio
-----Original Message-----
From: Zhou1, Tao <Tao.Zhou1 at amd.com>
Sent: Monday, May 8, 2023 7:05 PM
To: Zhang, Horatio <Hongkun.Zhang at amd.com>; amd-gfx at lists.freedesktop.org
Cc: Liu, HaoPing (Alan) <HaoPing.Liu at amd.com>; Zhang, Horatio <Hongkun.Zhang at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Jiang, Sonny <Sonny.Jiang at amd.com>; Limonciello, Mario <Mario.Limonciello at amd.com>; Liu, Leo <Leo.Liu at amd.com>; Zhang, Hawking <Hawking.Zhang at amd.com>
Subject: RE: [PATCH 2/2] drm/amdgpu: fix amdgpu_irq_put call trace in vcn_v4_0_hw_fini
[AMD Official Use Only - General]
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> Horatio Zhang
> Sent: Monday, May 8, 2023 6:20 PM
> To: amd-gfx at lists.freedesktop.org
> Cc: Liu, HaoPing (Alan) <HaoPing.Liu at amd.com>; Zhang, Horatio
> <Hongkun.Zhang at amd.com>; Xu, Feifei <Feifei.Xu at amd.com>; Zhou1, Tao
> <Tao.Zhou1 at amd.com>; Jiang, Sonny <Sonny.Jiang at amd.com>; Limonciello,
> Mario <Mario.Limonciello at amd.com>; Liu, Leo <Leo.Liu at amd.com>; Zhang,
> Hawking <Hawking.Zhang at amd.com>
> Subject: [PATCH 2/2] drm/amdgpu: fix amdgpu_irq_put call trace in
> vcn_v4_0_hw_fini
>
> During the suspend, the vcn_v4_0_hw_init function will use the
> amdgpu_irq_put to disable the irq of vcn.inst, but it was not enabled
> during the resume process, which resulted in a call trace during the GPU reset process.
>
> [ 44.563572] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu]
> [ 44.563629] RSP: 0018:ffffb36740edfc90 EFLAGS: 00010246
> [ 44.563630] RAX: 0000000000000000 RBX: 0000000000000001 RCX:
> 0000000000000000
> [ 44.563630] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000000000
> [ 44.563631] RBP: ffffb36740edfcb0 R08: 0000000000000000 R09:
> 0000000000000000
> [ 44.563631] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffff954c568e2ea8
> [ 44.563631] R13: 0000000000000000 R14: ffff954c568c0000 R15:
> ffff954c568e2ea8
> [ 44.563632] FS: 0000000000000000(0000) GS:ffff954f584c0000(0000)
> knlGS:0000000000000000
> [ 44.563632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 44.563633] CR2: 00007f028741ba70 CR3: 000000026ca10000 CR4:
> 0000000000750ee0
> [ 44.563633] PKRU: 55555554
> [ 44.563633] Call Trace:
> [ 44.563634] <TASK>
> [ 44.563634] vcn_v4_0_hw_fini+0x62/0x160 [amdgpu]
> [ 44.563700] vcn_v4_0_suspend+0x13/0x30 [amdgpu]
> [ 44.563755] amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu]
> [ 44.563806] amdgpu_device_ip_suspend+0x41/0x80 [amdgpu]
> [ 44.563858] amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu]
> [ 44.563909] amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu]
> [ 44.564006] amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu]
> [ 44.564061] process_one_work+0x21f/0x400
> [ 44.564062] worker_thread+0x200/0x3f0
> [ 44.564063] ? process_one_work+0x400/0x400
> [ 44.564064] kthread+0xee/0x120
> [ 44.564065] ? kthread_complete_and_exit+0x20/0x20
> [ 44.564066] ret_from_fork+0x22/0x30
>
> Fixes: ea5309de7388 ("drm/amdgpu: add VCN 4.0 RAS poison consumption
> handling")
> Signed-off-by: Horatio Zhang <Hongkun.Zhang at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> index bf0674039598..b55eb1bf3e30 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
> @@ -281,6 +281,21 @@ static int vcn_v4_0_hw_init(void *handle)
> return r;
> }
>
> +static int vcn_v4_0_late_init(void *handle) {
> + struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> + int i;
> +
> + for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {
> + if (adev->vcn.harvest_config & (1 << i))
> + continue;
> +
> + amdgpu_irq_get(adev, &adev->vcn.inst[i].irq, 0);
[Tao] we can also check its return value and exit if the r is none-zero. But either way is fine with me.
> + }
> +
> + return 0;
> +}
> +
> /**
> * vcn_v4_0_hw_fini - stop the hardware block
> *
> @@ -2047,7 +2062,7 @@ static void vcn_v4_0_set_irq_funcs(struct
> amdgpu_device *adev) static const struct amd_ip_funcs vcn_v4_0_ip_funcs = {
> .name = "vcn_v4_0",
> .early_init = vcn_v4_0_early_init,
> - .late_init = NULL,
> + .late_init = vcn_v4_0_late_init,
> .sw_init = vcn_v4_0_sw_init,
> .sw_fini = vcn_v4_0_sw_fini,
> .hw_init = vcn_v4_0_hw_init,
> --
> 2.34.1
More information about the amd-gfx
mailing list