[PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for PM abort case

Fri Jan 26 01:43:20 UTC 2024

[AMD Official Use Only - General]

> From: Alex Deucher <alexdeucher at gmail.com>
> Sent: Thursday, January 25, 2024 11:24 PM
> To: Liang, Prike <Prike.Liang at amd.com>
> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Sharma, Deepak
> <Deepak.Sharma at amd.com>
> Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for
> PM abort case
>
> On Thu, Jan 25, 2024 at 10:22 AM Alex Deucher <alexdeucher at gmail.com>
> wrote:
> >
> > On Wed, Jan 24, 2024 at 9:39 PM Liang, Prike <Prike.Liang at amd.com>
> wrote:
> > >
> > > [AMD Official Use Only - General]
> > >
> > > Hi, Alex
> > > > -----Original Message-----
> > > > From: Alex Deucher <alexdeucher at gmail.com>
> > > > Sent: Wednesday, January 24, 2024 11:59 PM
> > > > To: Liang, Prike <Prike.Liang at amd.com>
> > > > Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
> > > > <Alexander.Deucher at amd.com>; Sharma, Deepak
> > > > <Deepak.Sharma at amd.com>
> > > > Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC
> > > > registers for PM abort case
> > > >
> > > > On Wed, Jan 24, 2024 at 2:12 AM Prike Liang <Prike.Liang at amd.com>
> wrote:
> > > > >
> > > > > In the PM abort cases, the gfx power rail doesn't turn off so
> > > > > some GFXDEC registers/CSB can't reset to default vaule. In order
> > > > > to avoid unexpected problem now need skip to program GFXDEC
> > > > > registers and bypass issue CSB packet for PM abort case.
> > > > >
> > > > > Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> > > > > ---
> > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu.h     | 1 +
> > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> > > > >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 4 ++++
> > > > >  3 files changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > index c5f3859fd682..26d983eb831b 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > > > @@ -1079,6 +1079,7 @@ struct amdgpu_device {
> > > > >         bool                            in_s3;
> > > > >         bool                            in_s4;
> > > > >         bool                            in_s0ix;
> > > > > +       bool                            pm_complete;
> > > > >
> > > > >         enum pp_mp1_state               mp1_state;
> > > > >         struct amdgpu_doorbell_index doorbell_index; diff --git
> > > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > index 475bd59c9ac2..a01f9b0c2f30 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > @@ -2486,6 +2486,7 @@ static int
> > > > > amdgpu_pmops_suspend_noirq(struct
> > > > device *dev)
> > > > >         struct drm_device *drm_dev = dev_get_drvdata(dev);
> > > > >         struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > > > >
> > > > > +       adev->pm_complete = true;
> > > >
> > > > This needs to be cleared somewhere on resume.
> > > [Liang, Prike]  This flag is designed to indicate the amdgpu device
> suspension process status and will update the patch and clear it at the
> amdgpu suspension beginning point.
> > > >
> > > > >         if (amdgpu_acpi_should_gpu_reset(adev))
> > > > >                 return amdgpu_asic_reset(adev);
> > > > >
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > index 57808be6e3ec..3bf51f18e13c 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > > > > @@ -3034,6 +3034,10 @@ static int gfx_v9_0_cp_gfx_start(struct
> > > > > amdgpu_device *adev)
> > > > >
> > > > >         gfx_v9_0_cp_gfx_enable(adev, true);
> > > > >
> > > > > +       if (adev->in_suspend && !adev->pm_complete) {
> > > > > +               DRM_INFO(" will skip the csb ring write\n");
> > > > > +               return 0;
> > > > > +       }
> > > >
> > > > We probably want a similar fix for other gfx generations as well.
> > > >
> > > > Alex
> > > >
> > > [Liang, Prike] IIRC, there's no issue on the Mendocino side even without
> the fix. How about keep the other gfx generations unchanged firstly and after
> sort out the failed case will add the quirk for each specific gfx respectively?
> >
> > Mendocino only supports S0i3 so we don't touch gfx on suspend/resume.
> > This would only happen on platforms that support S3.
>
> E.g., try an aborted suspend on Raphael or PHX2.
>
> Alex
>
[Liang, Prike]  Thanks for the reminder, but the Mendocino also was verified on the system with S3 enabled from BIOS. I will double confirm if there need the quirk on the RPL or PHX2.

> >
> > Alex
> >
> > >
> > > > >         r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 + 3);
> > > > >         if (r) {
> > > > >                 DRM_ERROR("amdgpu: cp failed to lock ring
> > > > > (%d).\n", r);
> > > > > --
> > > > > 2.34.1
> > > > >