[PATCH v4 1/3] drm/amd: Evict resources during PM ops prepare() callback

Mario Limonciello mario.limonciello at amd.com
Thu Oct 5 14:59:33 UTC 2023


On 10/5/2023 09:39, Christian König wrote:
> Am 04.10.23 um 19:18 schrieb Mario Limonciello:
>> Linux PM core has a prepare() callback run before suspend.
>>
>> If the system is under high memory pressure, the resources may need
>> to be evicted into swap instead.  If the storage backing for swap
>> is offlined during the suspend() step then such a call may fail.
>>
>> So duplicate this step into prepare() to move evict majority of
>> resources while leaving all existing steps that put the GPU into a
>> low power state in suspend().
>>
>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
>> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 +++++++++++++++++++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 +++---
>>   3 files changed, 30 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index d23fb4b5ad95..6643d0ed6b1b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1413,6 +1413,7 @@ void amdgpu_driver_postclose_kms(struct 
>> drm_device *dev,
>>   void amdgpu_driver_release_kms(struct drm_device *dev);
>>   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>> +int amdgpu_device_prepare(struct drm_device *dev);
>>   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>>   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
>>   u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index bad2b5577e96..67acee569c08 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4259,6 +4259,31 @@ static int amdgpu_device_evict_resources(struct 
>> amdgpu_device *adev)
>>   /*
>>    * Suspend & resume.
>>    */
>> +/**
>> + * amdgpu_device_prepare - prepare for device suspend
>> + *
>> + * @dev: drm dev pointer
>> + *
>> + * Prepare to put the hw in the suspend state (all asics).
>> + * Returns 0 for success or an error on failure.
>> + * Called at driver suspend.
>> + */
>> +int amdgpu_device_prepare(struct drm_device *dev)
>> +{
>> +    struct amdgpu_device *adev = drm_to_adev(dev);
>> +    int r;
>> +
>> +    if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
>> +        return 0;
>> +
>> +    /* Evict the majority of BOs before starting suspend sequence */
>> +    r = amdgpu_device_evict_resources(adev);
>> +    if (r)
>> +        return r;
>> +
>> +    return 0;
>> +}
>> +
>>   /**
>>    * amdgpu_device_suspend - initiate device suspend
>>    *
>> @@ -4279,7 +4304,6 @@ int amdgpu_device_suspend(struct drm_device 
>> *dev, bool fbcon)
>>       adev->in_suspend = true;
>> -    /* Evict the majority of BOs before grabbing the full access */
>>       r = amdgpu_device_evict_resources(adev);
>>       if (r)
>>           return r;
> 
> I would just completely drop this extra amdgpu_device_evict_resources() 
> call now.
> 
> We have a second call which is used to evacuate firmware etc... after 
> the hw has been shut down. That one can't move, but also shouldn't 
> allocate that much memory.
> 

The problem is that amdgpu_device_suspend() is also called from 
amdgpu_switcheroo_set_state() as well as a bunch of pmops sequences that 
I don't expect call prepare() like poweroff().

I would think we still want to evict resources at the beginning of 
amdgpu_device_suspend() for all of those.

So it's an extra call for the prepare() path but it should be harmless.

> Regards,
> Christian.
> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index e3471293846f..175167582db0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -2425,8 +2425,9 @@ static int amdgpu_pmops_prepare(struct device *dev)
>>       /* Return a positive number here so
>>        * DPM_FLAG_SMART_SUSPEND works properly
>>        */
>> -    if (amdgpu_device_supports_boco(drm_dev))
>> -        return pm_runtime_suspended(dev);
>> +    if (amdgpu_device_supports_boco(drm_dev) &&
>> +        pm_runtime_suspended(dev))
>> +        return 1;
>>       /* if we will not support s3 or s2i for the device
>>        *  then skip suspend
>> @@ -2435,7 +2436,7 @@ static int amdgpu_pmops_prepare(struct device *dev)
>>           !amdgpu_acpi_is_s3_active(adev))
>>           return 1;
>> -    return 0;
>> +    return amdgpu_device_prepare(drm_dev);
>>   }
>>   static void amdgpu_pmops_complete(struct device *dev)
> 



More information about the amd-gfx mailing list