[PATCH] drm/amdgpu: reset asic after system-wide suspend aborted
Christian König
ckoenig.leichtzumerken at gmail.com
Thu Nov 18 14:11:54 UTC 2021
Am 18.11.21 um 15:09 schrieb Lazar, Lijo:
> On 11/18/2021 7:36 PM, Alex Deucher wrote:
>> On Thu, Nov 18, 2021 at 8:11 AM Liang, Prike <Prike.Liang at amd.com>
>> wrote:
>>>
>>> [Public]
>>>
>>>> -----Original Message-----
>>>> From: Lazar, Lijo <Lijo.Lazar at amd.com>
>>>> Sent: Thursday, November 18, 2021 4:01 PM
>>>> To: Liang, Prike <Prike.Liang at amd.com>; amd-gfx at lists.freedesktop.org
>>>> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Huang, Ray
>>>> <Ray.Huang at amd.com>
>>>> Subject: Re: [PATCH] drm/amdgpu: reset asic after system-wide suspend
>>>> aborted
>>>>
>>>>
>>>>
>>>> On 11/18/2021 12:32 PM, Prike Liang wrote:
>>>>> Do ASIC reset at the moment Sx suspend aborted behind of amdgpu
>>>>> suspend to keep AMDGPU in a clean reset state and that can avoid
>>>>> re-initialize device improperly error.
>>>>>
>>>>> Signed-off-by: Prike Liang <Prike.Liang at amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++++
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 19
>>>> +++++++++++++++++++
>>>>> 3 files changed, 24 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> index b85b67a..8bd9833 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>> @@ -1053,6 +1053,7 @@ struct amdgpu_device {
>>>>> bool in_s3;
>>>>> bool in_s4;
>>>>> bool in_s0ix;
>>>>> + bool pm_completed;
>>>>
>>>> PM framework maintains separate flags, why not use the same?
>>>>
>>>> dev->power.is_suspended = false;
>>>> dev->power.is_noirq_suspended = false;
>>>> dev->power.is_late_suspended = false;
>>>>
>>>
>>> Thanks for pointing it out and I miss that flag. In this case we can
>>> use the PM flag is_noirq_suspended for checking AMDGPU device
>>> whether is finished suspend.
>>>
>>>> BTW, Alex posted a patch which does similar thing, though it tries
>>>> reset if
>>>> suspend fails.
>>>>
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2FDM6PR12MB26195F8E099407B4B6966FEBE4999%40&data=04%7C01%7CLijo.Lazar%40amd.com%7C6401dce9411b4c134b0208d9aa9ca644%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637728412055353107%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Z4dgNvHISHVlR4grHm1RU3%2FJMJVdRe7Ukq1DnjCgG0o%3D&reserved=0
>>>>
>>>> DM6PR12MB2619.namprd12.prod.outlook.com/
>>>>
>>>> If that reset also failed, then no point in another reset, or keep
>>>> it as part of
>>>> resume.
>>>>
>>>
>>> Alex's patch seems always do the ASIC reset at the end of AMDGPU
>>> device, but that may needn't on the normal AMDGPU device suspend.
>>> However, this patch shows that can handle the system suspend aborted
>>> after AMDGPU suspend case well, so now seems we only need take care
>>> suspend abort case here.
>>>
>>
>> Yeah, I was thinking we'd take this patch rather than mine to minimize
>> the number of resets.
>>
>
> Wondering if suspend fails whether there is a need to call resume. It
> may not get to resume() to do a reset, that needs to be checked.
I would rather do it so that we reset the ASIC during resume when we
detect that the hardware is in a bad state.
This way it should also work with things like kexec and virtualization.
Regards,
Christian.
>
> Thanks,
> Lijo
>
>
>> Alex
>>
>>
>>>> Thanks,
>>>> Lijo
>>>>
>>>>>
>>>>> atomic_t in_gpu_reset;
>>>>> enum pp_mp1_state mp1_state;
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> index ec42a6f..a12ed54 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>> @@ -3983,6 +3983,10 @@ int amdgpu_device_resume(struct drm_device
>>>> *dev, bool fbcon)
>>>>> if (adev->in_s0ix)
>>>>> amdgpu_gfx_state_change_set(adev,
>>>> sGpuChangeState_D0Entry);
>>>>>
>>>>> + if (!adev->pm_completed) {
>>>>> + dev_warn(adev->dev, "suspend aborted will do asic
>>>>> reset\n");
>>>>> + amdgpu_asic_reset(adev);
>>>>> + }
>>>>> /* post card */
>>>>> if (amdgpu_device_need_post(adev)) {
>>>>> r = amdgpu_device_asic_init(adev); diff --git
>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> index eee3cf8..9f90017 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> @@ -2168,6 +2168,23 @@ static int amdgpu_pmops_suspend(struct
>>>> device *dev)
>>>>> return r;
>>>>> }
>>>>>
>>>>> +/*
>>>>> + * Actually the PM suspend whether is completed should be confirmed
>>>>> + * by checking the sysfs
>>>>> +sys/power/suspend_stats/failed_suspend.However,
>>>>> + * in this function only check the AMDGPU device whether is
>>>>> suspended
>>>>> + * completely in the system-wide suspend process.
>>>>> + */
>>>>> +static int amdgpu_pmops_noirq_suspend(struct device *dev) {
>>>>> + struct drm_device *drm_dev = dev_get_drvdata(dev);
>>>>> + struct amdgpu_device *adev = drm_to_adev(drm_dev);
>>>>> +
>>>>> + dev_dbg(dev, "amdgpu suspend completely.\n");
>>>>> + adev->pm_completed = true;
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> static int amdgpu_pmops_resume(struct device *dev)
>>>>> {
>>>>> struct drm_device *drm_dev = dev_get_drvdata(dev); @@ -2181,6
>>>>> +2198,7 @@ static int amdgpu_pmops_resume(struct device *dev)
>>>>> r = amdgpu_device_resume(drm_dev, true);
>>>>> if (amdgpu_acpi_is_s0ix_active(adev))
>>>>> adev->in_s0ix = false;
>>>>> + adev->pm_completed = false;
>>>>> return r;
>>>>> }
>>>>>
>>>>> @@ -2397,6 +2415,7 @@ static const struct dev_pm_ops amdgpu_pm_ops
>>>> = {
>>>>> .runtime_suspend = amdgpu_pmops_runtime_suspend,
>>>>> .runtime_resume = amdgpu_pmops_runtime_resume,
>>>>> .runtime_idle = amdgpu_pmops_runtime_idle,
>>>>> + .suspend_noirq = amdgpu_pmops_noirq_suspend,
>>>>> };
>>>>>
>>>>> static int amdgpu_flush(struct file *f, fl_owner_t id)
>>>>>
More information about the amd-gfx
mailing list