[PATCH] drm/amdgpu/gfx: disable cp_ecc_error_irq only when gfx ras is enabled in suspend

Chen, Guchun Guchun.Chen at amd.com
Mon May 8 02:29:49 UTC 2023


[AMD Official Use Only - General]

Thanks for the info, Hawking. Yes, I just recalled Horatio had a solution in gfx11 to fix such warnings. I will provide patch set v2 to handle gfx v9 only.

Regards,
Guchun

> -----Original Message-----
> From: Zhang, Hawking <Hawking.Zhang at amd.com>
> Sent: Monday, May 8, 2023 10:23 AM
> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; Chen, Guchun
> <Guchun.Chen at amd.com>; amd-gfx at lists.freedesktop.org; Deucher,
> Alexander <Alexander.Deucher at amd.com>; Lazar, Lijo
> <Lijo.Lazar at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>;
> Zhang, Horatio <Hongkun.Zhang at amd.com>
> Subject: RE: [PATCH] drm/amdgpu/gfx: disable cp_ecc_error_irq only when
> gfx ras is enabled in suspend
>
> [AMD Official Use Only - General]
>
> Add @Zhang, Horatio
>
> Gfx11 should be addressed by Horatio's patch, not sure he committed yet.
> The solution is retiring cp_ecc_irq funcs since gfx11 doesn't rely on the irq for
> any software ras feature.
>
> Gfx9 could still add RAS block check since we have legacy ras feature that
> needs the interrupt.
>
> Hi Horatio,
>
> Did you commit your fix yet?
>
> Regards,
> Hawking
>
> -----Original Message-----
> From: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Sent: Monday, May 8, 2023 10:16
> To: Chen, Guchun <Guchun.Chen at amd.com>; amd-
> gfx at lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Zhang, Hawking
> <Hawking.Zhang at amd.com>; Lazar, Lijo <Lijo.Lazar at amd.com>; Koenig,
> Christian <Christian.Koenig at amd.com>
> Subject: RE: [PATCH] drm/amdgpu/gfx: disable cp_ecc_error_irq only when
> gfx ras is enabled in suspend
>
> [AMD Official Use Only - General]
>
> Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>
>
> > -----Original Message-----
> > From: Chen, Guchun <Guchun.Chen at amd.com>
> > Sent: Saturday, May 6, 2023 8:16 PM
> > To: amd-gfx at lists.freedesktop.org; Deucher, Alexander
> > <Alexander.Deucher at amd.com>; Zhang, Hawking
> <Hawking.Zhang at amd.com>;
> > Lazar, Lijo <Lijo.Lazar at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>;
> > Koenig, Christian <Christian.Koenig at amd.com>
> > Cc: Chen, Guchun <Guchun.Chen at amd.com>
> > Subject: [PATCH] drm/amdgpu/gfx: disable cp_ecc_error_irq only when
> > gfx ras is enabled in suspend
> >
> > cp_ecc_error_irq is only enabled when gfx ras is assert.
> > So in gfx_v9_0_hw_fini, interrupt disablement for cp_ecc_error_irq
> > should be executed under such condition, otherwise, an amdgpu_irq_put
> > calltrace will occur.
> >
> > [ 7283.170322] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [
> > 7283.170964]
> > RSP: 0018:ffff9a5fc3967d00 EFLAGS: 00010246 [ 7283.170967] RAX:
> > ffff98d88afd3040 RBX: ffff98d89da20000 RCX: 0000000000000000 [
> > 7283.170969] RDX: 0000000000000000 RSI: ffff98d89da2bef8 RDI:
> > ffff98d89da20000 [ 7283.170971] RBP: ffff98d89da20000 R08:
> > ffff98d89da2ca18 R09: 0000000000000006 [ 7283.170973] R10:
> > ffffd5764243c008 R11: 0000000000000000 R12: 0000000000001050 [
> > 7283.170975] R13: ffff98d89da38978 R14: ffffffff999ae15a R15:
> > ffff98d880130105 [ 7283.170978] FS:  0000000000000000(0000)
> > GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.170981] CS:
> > 0010
> > DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.170983] CR2:
> > 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [
> > 7283.170986] Call Trace:
> > [ 7283.170988]  <TASK>
> > [ 7283.170989]  gfx_v9_0_hw_fini+0x1c/0x6d0 [amdgpu] [ 7283.171655]
> > amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu]
> [ 7283.172245]
> > amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.172823]
> > amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.173412]
> > pci_pm_freeze+0x54/0xc0 [ 7283.173419]  ?
> > __pfx_pci_pm_freeze+0x10/0x10 [ 7283.173425]
> > dpm_run_callback+0x98/0x200 [ 7283.173430]
> > __device_suspend+0x164/0x5f0
> >
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522
> >
> > Signed-off-by: Guchun Chen <guchun.chen at amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 ++-
> > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 3 ++-
> >  2 files changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > index ecf8ceb53311..f6bc62a94099 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
> > @@ -4442,7 +4442,8 @@ static int gfx_v11_0_hw_fini(void *handle)
> >     struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >     int r;
> >
> > -   amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
> > +   if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> > +           amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
> >     amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
> >     amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > index ae09fc1cfe6b..c54d05bdc2d8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > @@ -3751,7 +3751,8 @@ static int gfx_v9_0_hw_fini(void *handle)  {
> >     struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >
> > -   amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
> > +   if (amdgpu_ras_is_supported(adev, AMDGPU_RAS_BLOCK__GFX))
> > +           amdgpu_irq_put(adev, &adev->gfx.cp_ecc_error_irq, 0);
> >     amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
> >     amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
> >
> > --
> > 2.25.1


More information about the amd-gfx mailing list