[PATCH v4 1/3] drm/amd: Evict resources during PM ops prepare() callback

Fri Oct 6 07:22:28 UTC 2023

Am 05.10.23 um 16:59 schrieb Mario Limonciello:
> On 10/5/2023 09:39, Christian König wrote:
>> Am 04.10.23 um 19:18 schrieb Mario Limonciello:
>>> Linux PM core has a prepare() callback run before suspend.
>>>
>>> If the system is under high memory pressure, the resources may need
>>> to be evicted into swap instead.  If the storage backing for swap
>>> is offlined during the suspend() step then such a call may fail.
>>>
>>> So duplicate this step into prepare() to move evict majority of
>>> resources while leaving all existing steps that put the GPU into a
>>> low power state in suspend().
>>>
>>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
>>> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 
>>> +++++++++++++++++++++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 +++---
>>>   3 files changed, 30 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index d23fb4b5ad95..6643d0ed6b1b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -1413,6 +1413,7 @@ void amdgpu_driver_postclose_kms(struct 
>>> drm_device *dev,
>>>   void amdgpu_driver_release_kms(struct drm_device *dev);
>>>   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>>> +int amdgpu_device_prepare(struct drm_device *dev);
>>>   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>>>   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
>>>   u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index bad2b5577e96..67acee569c08 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -4259,6 +4259,31 @@ static int 
>>> amdgpu_device_evict_resources(struct amdgpu_device *adev)
>>>   /*
>>>    * Suspend & resume.
>>>    */
>>> +/**
>>> + * amdgpu_device_prepare - prepare for device suspend
>>> + *
>>> + * @dev: drm dev pointer
>>> + *
>>> + * Prepare to put the hw in the suspend state (all asics).
>>> + * Returns 0 for success or an error on failure.
>>> + * Called at driver suspend.
>>> + */
>>> +int amdgpu_device_prepare(struct drm_device *dev)
>>> +{
>>> +    struct amdgpu_device *adev = drm_to_adev(dev);
>>> +    int r;
>>> +
>>> +    if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
>>> +        return 0;
>>> +
>>> +    /* Evict the majority of BOs before starting suspend sequence */
>>> +    r = amdgpu_device_evict_resources(adev);
>>> +    if (r)
>>> +        return r;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>   /**
>>>    * amdgpu_device_suspend - initiate device suspend
>>>    *
>>> @@ -4279,7 +4304,6 @@ int amdgpu_device_suspend(struct drm_device 
>>> *dev, bool fbcon)
>>>       adev->in_suspend = true;
>>> -    /* Evict the majority of BOs before grabbing the full access */
>>>       r = amdgpu_device_evict_resources(adev);
>>>       if (r)
>>>           return r;
>>
>> I would just completely drop this extra 
>> amdgpu_device_evict_resources() call now.
>>
>> We have a second call which is used to evacuate firmware etc... after 
>> the hw has been shut down. That one can't move, but also shouldn't 
>> allocate that much memory.
>>
>
> The problem is that amdgpu_device_suspend() is also called from 
> amdgpu_switcheroo_set_state() as well as a bunch of pmops sequences 
> that I don't expect call prepare() like poweroff().
>
> I would think we still want to evict resources at the beginning of 
> amdgpu_device_suspend() for all of those.
>
> So it's an extra call for the prepare() path but it should be harmless.

That's true, but I would rather say that we should then call 
amdgpu_device_prepare() in those call paths as well.

We might end up putting more stuff into the prepare call than just 
eviction VRAM.

Regards,
Christian.

>
>> Regards,
>> Christian.
>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> index e3471293846f..175167582db0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -2425,8 +2425,9 @@ static int amdgpu_pmops_prepare(struct device 
>>> *dev)
>>>       /* Return a positive number here so
>>>        * DPM_FLAG_SMART_SUSPEND works properly
>>>        */
>>> -    if (amdgpu_device_supports_boco(drm_dev))
>>> -        return pm_runtime_suspended(dev);
>>> +    if (amdgpu_device_supports_boco(drm_dev) &&
>>> +        pm_runtime_suspended(dev))
>>> +        return 1;
>>>       /* if we will not support s3 or s2i for the device
>>>        *  then skip suspend
>>> @@ -2435,7 +2436,7 @@ static int amdgpu_pmops_prepare(struct device 
>>> *dev)
>>>           !amdgpu_acpi_is_s3_active(adev))
>>>           return 1;
>>> -    return 0;
>>> +    return amdgpu_device_prepare(drm_dev);
>>>   }
>>>   static void amdgpu_pmops_complete(struct device *dev)
>>
>