[PATCH] drm/amdgpu: reset asic after system-wide suspend aborted
Lazar, Lijo
lijo.lazar at amd.com
Thu Nov 18 14:09:54 UTC 2021
On 11/18/2021 7:36 PM, Alex Deucher wrote:
> On Thu, Nov 18, 2021 at 8:11 AM Liang, Prike <Prike.Liang at amd.com> wrote:
>>
>> [Public]
>>
>>> -----Original Message-----
>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>> Sent: Thursday, November 18, 2021 4:01 PM
>>> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Huang, Ray
>>> <Ray.Huang at amd.com>
>>> Subject: Re: [PATCH] drm/amdgpu: reset asic after system-wide suspend
>>> aborted
>>>
>>>
>>>
>>> On 11/18/2021 12:32 PM, Prike Liang wrote:
>>>> Do ASIC reset at the moment Sx suspend aborted behind of amdgpu
>>>> suspend to keep AMDGPU in a clean reset state and that can avoid
>>>> re-initialize device improperly error.
>>>>
>>>> Signed-off-by: Prike Liang <Prike.Liang at amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 19
>>> +++++++++++++++++++
>>>> 3 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index b85b67a..8bd9833 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -1053,6 +1053,7 @@ struct amdgpu_device {
>>>> bool in_s3;
>>>> bool in_s4;
>>>> bool in_s0ix;
>>>> + bool pm_completed;
>>>
>>> PM framework maintains separate flags, why not use the same?
>>>
>>> dev->power.is_suspended = false;
>>> dev->power.is_noirq_suspended = false;
>>> dev->power.is_late_suspended = false;
>>>
>>
>> Thanks for pointing it out and I miss that flag. In this case we can use the PM flag is_noirq_suspended for checking AMDGPU device whether is finished suspend.
>>
>>> BTW, Alex posted a patch which does similar thing, though it tries reset if
>>> suspend fails.
>>>
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2FDM6PR12MB26195F8E099407B4B6966FEBE4999%40&data=04%7C01%7CLijo.Lazar%40amd.com%7C6401dce9411b4c134b0208d9aa9ca644%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637728412055353107%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Z4dgNvHISHVlR4grHm1RU3%2FJMJVdRe7Ukq1DnjCgG0o%3D&reserved=0
>>> DM6PR12MB2619.namprd12.prod.outlook.com/
>>>
>>> If that reset also failed, then no point in another reset, or keep it as part of
>>> resume.
>>>
>>
>> Alex's patch seems always do the ASIC reset at the end of AMDGPU device, but that may needn't on the normal AMDGPU device suspend. However, this patch shows that can handle the system suspend aborted after AMDGPU suspend case well, so now seems we only need take care suspend abort case here.
>>
>
> Yeah, I was thinking we'd take this patch rather than mine to minimize
> the number of resets.
>
Wondering if suspend fails whether there is a need to call resume. It
may not get to resume() to do a reset, that needs to be checked.
Thanks,
Lijo
> Alex
>
>
>>> Thanks,
>>> Lijo
>>>
>>>>
>>>> atomic_t in_gpu_reset;
>>>> enum pp_mp1_state mp1_state;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index ec42a6f..a12ed54 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -3983,6 +3983,10 @@ int amdgpu_device_resume(struct drm_device
>>> *dev, bool fbcon)
>>>> if (adev->in_s0ix)
>>>> amdgpu_gfx_state_change_set(adev,
>>> sGpuChangeState_D0Entry);
>>>>
>>>> + if (!adev->pm_completed) {
>>>> + dev_warn(adev->dev, "suspend aborted will do asic reset\n");
>>>> + amdgpu_asic_reset(adev);
>>>> + }
>>>> /* post card */
>>>> if (amdgpu_device_need_post(adev)) {
>>>> r = amdgpu_device_asic_init(adev); diff --git
>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> index eee3cf8..9f90017 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> @@ -2168,6 +2168,23 @@ static int amdgpu_pmops_suspend(struct
>>> device *dev)
>>>> return r;
>>>> }
>>>>
>>>> +/*
>>>> + * Actually the PM suspend whether is completed should be confirmed
>>>> + * by checking the sysfs
>>>> +sys/power/suspend_stats/failed_suspend.However,
>>>> + * in this function only check the AMDGPU device whether is suspended
>>>> + * completely in the system-wide suspend process.
>>>> + */
>>>> +static int amdgpu_pmops_noirq_suspend(struct device *dev) {
>>>> + struct drm_device *drm_dev = dev_get_drvdata(dev);
>>>> + struct amdgpu_device *adev = drm_to_adev(drm_dev);
>>>> +
>>>> + dev_dbg(dev, "amdgpu suspend completely.\n");
>>>> + adev->pm_completed = true;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> static int amdgpu_pmops_resume(struct device *dev)
>>>> {
>>>> struct drm_device *drm_dev = dev_get_drvdata(dev); @@ -2181,6
>>>> +2198,7 @@ static int amdgpu_pmops_resume(struct device *dev)
>>>> r = amdgpu_device_resume(drm_dev, true);
>>>> if (amdgpu_acpi_is_s0ix_active(adev))
>>>> adev->in_s0ix = false;
>>>> + adev->pm_completed = false;
>>>> return r;
>>>> }
>>>>
>>>> @@ -2397,6 +2415,7 @@ static const struct dev_pm_ops amdgpu_pm_ops
>>> = {
>>>> .runtime_suspend = amdgpu_pmops_runtime_suspend,
>>>> .runtime_resume = amdgpu_pmops_runtime_resume,
>>>> .runtime_idle = amdgpu_pmops_runtime_idle,
>>>> + .suspend_noirq = amdgpu_pmops_noirq_suspend,
>>>> };
>>>>
>>>> static int amdgpu_flush(struct file *f, fl_owner_t id)
>>>>
More information about the amd-gfx
mailing list