[PATCH 2/2] drm/amdgpu: reset gpu for pm abort case

Liang, Prike Prike.Liang at amd.com
Fri Jan 26 09:00:53 UTC 2024


[AMD Official Use Only - General]

>
> On 1/25/2024 8:52 AM, Prike Liang wrote:
> > In the pm abort case the gfx power rail not turn off from FCH side and
> > this will lead to the gfx reinitialized failed base on the unknown gfx
> > HW status, so let's reset the gpu to a known good power state.
> >
>
> From the description, this an APU only problem (or this patch could only
> resolve APU abort sequence). However, there is no check for APU in the patch
> below.
>
[Prike]  IIRC, there also has a similar problem on the dGPU side when suspend abort and
now this patch is only drafted for a hot issue on the RV series. If need we can add a TODO
item for drafting a more generic solution.

>
> > Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +++++
> >  drivers/gpu/drm/amd/amdgpu/soc15.c         | 8 +++++++-
> >  2 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 56d9dfa61290..4c40ffaaa5c2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -4627,6 +4627,11 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool fbcon)
> >                     return r;
> >     }
> >
> > +   if(amdgpu_asic_need_reset_on_init(adev)) {
> > +           DRM_INFO("PM abort case and let's reset asic \n");
> > +           amdgpu_asic_reset(adev);
> > +   }
> > +
>
> suspend_noirq is specific for suspend scenarios and not valid for freeze/thaw.
> I guess this could trigger reset for successful restore on APUs.
>
[Prike] If doesn't run into noirq_suspend then still need further check whether the PSP TOS is still alive before gpu reset.

> >     if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> >             return 0;
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > index 15033efec2ba..9329a00b6abc 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> > @@ -804,9 +804,16 @@ static bool soc15_need_reset_on_init(struct
> amdgpu_device *adev)
> >     if (adev->asic_type == CHIP_RENOIR)
> >             return true;
> >
> > +   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> > +
> >     /* Just return false for soc15 GPUs.  Reset does not seem to
> >      * be necessary.
> >      */
>
> The comment now doesn't make sense.
>
> Thanks,
> Lijo
>
> > +   if (adev->in_suspend && !adev->in_s0ix &&
> > +                   !adev->pm_complete &&
> > +                   sol_reg)
> > +           return true;
> > +
> >     if (!amdgpu_passthrough(adev))
> >             return false;
> >
> > @@ -816,7 +823,6 @@ static bool soc15_need_reset_on_init(struct
> amdgpu_device *adev)
> >     /* Check sOS sign of life register to confirm sys driver and sOS
> >      * are already been loaded.
> >      */
> > -   sol_reg = RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81);
> >     if (sol_reg)
> >             return true;
> >


More information about the amd-gfx mailing list