[v3] drm/amdgpu: reset asic after system-wide suspend aborted (v3)

Tue Dec 14 01:40:18 UTC 2021

[Public]

> -----Original Message-----
> From: Limonciello, Mario <Mario.Limonciello at amd.com>
> Sent: Tuesday, December 14, 2021 5:48 AM
> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Lazar, Lijo
> <Lijo.Lazar at amd.com>; Huang, Ray <Ray.Huang at amd.com>
> Subject: Re: [v3] drm/amdgpu: reset asic after system-wide suspend aborted
> (v3)
>
> On 11/24/2021 23:48, Prike Liang wrote:
> > Do ASIC reset at the moment Sx suspend aborted behind of amdgpu
> > suspend to keep AMDGPU in a clean reset state and that can avoid
> > re-initialize device improperly error. Currently,we just always do
> > asic reset in the amdgpu resume until sort out the PM abort case.
> >
> > v2: Remove incomplete PM abort flag and add GPU hive case check for
> > GPU reset.
> >
> > v3: Some dGPU reset method not support at the early resume time and
> > temprorary skip the dGPU case.
>
> FYI to you that this was tested on an issue with S3 exit to show success that
> you will want to include in a Fixes tag for v4 when you change it to just run
> for S3 path, not S0i3 path.
>
> https://gitlab.freedesktop.org/drm/amd/-/issues/1822
>
Yeah this patch need exclude for S0i3 case, in the S0i3 entry we skip some blocks suspend so we may not resume successfully after do GPU reset for S0i3 case. Furthermore, for S3 abort resume issue also can find on the dGPU series and Alex's following patch may can handle this generic issue better when we ignore the PM abort check.

https://lore.kernel.org/all/DM6PR12MB26195F8E099407B4B6966FEBE4999@DM6PR12MB2619.namprd12.prod.outlook.com/T/

Thanks,
Prike
> >
> > Signed-off-by: Prike Liang <Prike.Liang at amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 ++++++++
> >   1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 7d4115d..f6e1a6a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -3983,6 +3983,14 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool fbcon)
> >     if (adev->in_s0ix)
> >             amdgpu_gfx_state_change_set(adev,
> sGpuChangeState_D0Entry);
> >
> > +   /*TODO: In order to not let all-always asic reset affect resume
> latency
> > +    * need sort out the case which really need asic reset in the resume
> process.
> > +    * As to the known issue on the system suspend abort behind the
> AMDGPU suspend,
> > +    * may can sort this case by checking struct suspend_stats which
> need exported
> > +    * firstly.
> > +    */
> > +   if (adev->flags & AMD_IS_APU)
> > +           amdgpu_asic_reset(adev);
> >     /* post card */
> >     if (amdgpu_device_need_post(adev)) {
> >             r = amdgpu_device_asic_init(adev);
> >