[PATCH v2 5/5] drm/amdgpu: do not resume device in thaw for normal hibernation

Mon Jul 7 02:29:55 UTC 2025

On 7/6/2025 10:28 PM, Lazar, Lijo wrote:
> 
> 
> On 7/7/2025 2:04 AM, Mario Limonciello wrote:
>> On 7/4/2025 6:12 AM, Samuel Zhang wrote:
>>> For normal hibernation, GPU do not need to be resumed in thaw since it
>>> is not involved in writing the hibernation image. Skip resume in this
>>> case can reduce the hibernation time.
>>
>> Since you have the measurements would you mind including them in the
>> commit message for reference?
>>
>>>
>>> For cancelled hibernation, GPU need to be resumed.
>>
>> If I'm following right you are actually handling two different things in
>> this patch aren't you?
>>
>> 1) A change in thaw() to only resume on aborted hibernation
>> 2) A change in shutdown() to skip running if the in s4 when shutdown()
>> is called.
>>
>> So I think it would be more logical to split this into two patches.
>>
> 
> This is doing only one thing - Keep the device in suspended state for
> thaw() operation during a successful hibernation. Splitting into two
> could break hibernation during integration of the first part - it will
> attempt another suspend during shutdown. I think we don't take care of
> consecutive suspend calls.
> 
> Thanks,
> Lijo

Got it; thanks for clarification.

> 
>>>
>>> Signed-off-by: Samuel Zhang <guoqing.zhang at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 ++++++++
>>>    1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/
>>> drm/amd/amdgpu/amdgpu_drv.c
>>> index 4f8632737574..e064816aae4d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -2541,6 +2541,10 @@ amdgpu_pci_shutdown(struct pci_dev *pdev)
>>>        if (amdgpu_ras_intr_triggered())
>>>            return;
>>>    +    /* device maybe not resumed here, return immediately in this
>>> case */
>>> +    if (adev->in_s4 && adev->in_suspend)
>>> +        return;
>>> +
>>>        /* if we are running in a VM, make sure the device
>>>         * torn down properly on reboot/shutdown.
>>>         * unfortunately we can't detect certain
>>> @@ -2655,6 +2659,10 @@ static int amdgpu_pmops_thaw(struct device *dev)
>>>    {
>>>        struct drm_device *drm_dev = dev_get_drvdata(dev);
>>>    +    /* do not resume device for normal hibernation */
>>> +    if (pm_transition.event == PM_EVENT_THAW)
>>> +        return 0;
>>> +
>>
>> Without digging into pm.h documentation I think it's not going to be
>> very obvious next time we look at this code that amdgpu_device_resume()
>> is only intended for the aborted case.
>>
>> How would you feel about a switch/case?
>>
>> Something like this:
>>
>> switch (pm_transition.event) {
>> /* normal hibernation */
>> case PM_EVENT_THAW:
>>      return 0;
>> /* for aborted hibernation */
>> case PM_EVENT_RECOVER:
>>      return amdgpu_device_resume(drm_dev, true);
>> default:
>>      return -EOPNOTSUP;
>> }
>>
>>
>>>        return amdgpu_device_resume(drm_dev, true);
>>>    }
>>>    
>>
>