[PATCH 2/2] drm/amdgpu: reset gpu for pm abort case

Liang, Prike Prike.Liang at amd.com
Thu Jan 25 02:43:59 UTC 2024


[AMD Official Use Only - General]

Hi, Alex

> -----Original Message-----
> From: Alex Deucher <alexdeucher at gmail.com>
> Sent: Thursday, January 25, 2024 12:07 AM
> To: Liang, Prike <Prike.Liang at amd.com>
> Cc: amd-gfx at lists.freedesktop.org; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Sharma, Deepak
> <Deepak.Sharma at amd.com>
> Subject: Re: [PATCH 2/2] drm/amdgpu: reset gpu for pm abort case
>
> On Wed, Jan 24, 2024 at 2:17 AM Prike Liang <Prike.Liang at amd.com> wrote:
> >
> > In the pm abort case the gfx power rail not turn off from FCH side and
> > this will lead to the gfx reinitialized failed base on the unknown gfx
> > HW status, so let's reset the gpu to a known good power state.
> >
> > Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 +++++
> >  drivers/gpu/drm/amd/amdgpu/soc15.c      | 7 ++++++-
> >  2 files changed, 11 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index 9153f69bad7f..14b66c49a536 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -2935,6 +2935,11 @@ static int psp_resume(void *handle)
> >
> >         dev_info(adev->dev, "PSP is resuming...\n");
> >
> > +       if(amdgpu_asic_need_reset_on_init(adev)) {
> > +               DRM_INFO("PM abort case and let's reset asic \n");
> > +               amdgpu_asic_reset(adev);
> > +       }
> > +
>
> Seems like it would be better to put this in the resume callback.
[Liang, Prike]  Yes, it makes sense to put the device level function call in device resume callback.
>
> >         if (psp->mem_train_ctx.enable_mem_training) {
> >                 ret = psp_mem_training(psp, PSP_MEM_TRAIN_RESUME);
> >                 if (ret) {
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 15033efec2ba..6ec4f6958c4c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -804,9 +804,15 @@ static bool soc15_need_reset_on_init(struct
> amdgpu_device *adev)
> >         if (adev->asic_type == CHIP_RENOIR)
> >                 return true;
> >
> > +       sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
>
> Is this register consistent for all soc15 chips?
>
> Alex
>
Yes, confirmed from the PSP firmware guys this register can be used for indicating the TOS/SOS live status on the SOC15 series.

> > +
> >         /* Just return false for soc15 GPUs.  Reset does not seem to
> >          * be necessary.
> >          */
> > +       if (adev->in_suspend && !added->microplate &&
> > +                       sol_reg)
> > +               return true;
> > +
> >         if (!amdgpu_passthrough(adev))
> >                 return false;
> >
> > @@ -816,7 +822,6 @@ static bool soc15_need_reset_on_init(struct
> amdgpu_device *adev)
> >         /* Check sOS sign of life register to confirm sys driver and sOS
> >          * are already been loaded.
> >          */
> > -       sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> >         if (sol_reg)
> >                 return true;
> >> --
> > 2.34.1
> >


More information about the amd-gfx mailing list