[PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for PM abort case

Thu Jan 25 02:39:49 UTC 2024

[AMD Official Use Only - General]

Hi, Alex
> -----Original Message-----
> From: Alex Deucher <alexdeucher at gmail.com>
> Sent: Wednesday, January 24, 2024 11:59 PM
> To: Liang, Prike <Prike.Liang at amd.com>
> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Sharma, Deepak
> <Deepak.Sharma at amd.com>
> Subject: Re: [PATCH 1/2] drm/amdgpu: skip to program GFXDEC registers for
> PM abort case
>
> On Wed, Jan 24, 2024 at 2:12 AM Prike Liang <Prike.Liang at amd.com> wrote:
> >
> > In the PM abort cases, the gfx power rail doesn't turn off so some
> > GFXDEC registers/CSB can't reset to default vaule. In order to avoid
> > unexpected problem now need skip to program GFXDEC registers and
> > bypass issue CSB packet for PM abort case.
> >
> > Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h     | 1 +
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> >  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 4 ++++
> >  3 files changed, 6 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index c5f3859fd682..26d983eb831b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -1079,6 +1079,7 @@ struct amdgpu_device {
> >         bool                            in_s3;
> >         bool                            in_s4;
> >         bool                            in_s0ix;
> > +       bool                            pm_complete;
> >
> >         enum pp_mp1_state               mp1_state;
> >         struct amdgpu_doorbell_index doorbell_index; diff --git
> > a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 475bd59c9ac2..a01f9b0c2f30 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -2486,6 +2486,7 @@ static int amdgpu_pmops_suspend_noirq(struct
> device *dev)
> >         struct drm_device *drm_dev = dev_get_drvdata(dev);
> >         struct amdgpu_device *adev = drm_to_adev(drm_dev);
> >
> > +       adev->pm_complete = true;
>
> This needs to be cleared somewhere on resume.
[Liang, Prike]  This flag is designed to indicate the amdgpu device suspension process status and will update the patch and clear it at the amdgpu suspension beginning point.
>
> >         if (amdgpu_acpi_should_gpu_reset(adev))
> >                 return amdgpu_asic_reset(adev);
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > index 57808be6e3ec..3bf51f18e13c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> > @@ -3034,6 +3034,10 @@ static int gfx_v9_0_cp_gfx_start(struct
> > amdgpu_device *adev)
> >
> >         gfx_v9_0_cp_gfx_enable(adev, true);
> >
> > +       if (adev->in_suspend && !adev->pm_complete) {
> > +               DRM_INFO(" will skip the csb ring write\n");
> > +               return 0;
> > +       }
>
> We probably want a similar fix for other gfx generations as well.
>
> Alex
>
[Liang, Prike] IIRC, there's no issue on the Mendocino side even without the fix. How about keep the other gfx generations unchanged firstly and after sort out the failed case will add the quirk for each specific gfx respectively?

> >         r = amdgpu_ring_alloc(ring, gfx_v9_0_get_csb_size(adev) + 4 + 3);
> >         if (r) {
> >                 DRM_ERROR("amdgpu: cp failed to lock ring (%d).\n",
> > r);
> > --
> > 2.34.1
> >