[PATCH v4 1/3] drm/amd: Evict resources during PM ops prepare() callback
Christian König
ckoenig.leichtzumerken at gmail.com
Fri Oct 6 07:22:28 UTC 2023
Am 05.10.23 um 16:59 schrieb Mario Limonciello:
> On 10/5/2023 09:39, Christian König wrote:
>> Am 04.10.23 um 19:18 schrieb Mario Limonciello:
>>> Linux PM core has a prepare() callback run before suspend.
>>>
>>> If the system is under high memory pressure, the resources may need
>>> to be evicted into swap instead. If the storage backing for swap
>>> is offlined during the suspend() step then such a call may fail.
>>>
>>> So duplicate this step into prepare() to move evict majority of
>>> resources while leaving all existing steps that put the GPU into a
>>> low power state in suspend().
>>>
>>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2362
>>> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26
>>> +++++++++++++++++++++-
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 7 +++---
>>> 3 files changed, 30 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index d23fb4b5ad95..6643d0ed6b1b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -1413,6 +1413,7 @@ void amdgpu_driver_postclose_kms(struct
>>> drm_device *dev,
>>> void amdgpu_driver_release_kms(struct drm_device *dev);
>>> int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>>> +int amdgpu_device_prepare(struct drm_device *dev);
>>> int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>>> int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
>>> u32 amdgpu_get_vblank_counter_kms(struct drm_crtc *crtc);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index bad2b5577e96..67acee569c08 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -4259,6 +4259,31 @@ static int
>>> amdgpu_device_evict_resources(struct amdgpu_device *adev)
>>> /*
>>> * Suspend & resume.
>>> */
>>> +/**
>>> + * amdgpu_device_prepare - prepare for device suspend
>>> + *
>>> + * @dev: drm dev pointer
>>> + *
>>> + * Prepare to put the hw in the suspend state (all asics).
>>> + * Returns 0 for success or an error on failure.
>>> + * Called at driver suspend.
>>> + */
>>> +int amdgpu_device_prepare(struct drm_device *dev)
>>> +{
>>> + struct amdgpu_device *adev = drm_to_adev(dev);
>>> + int r;
>>> +
>>> + if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
>>> + return 0;
>>> +
>>> + /* Evict the majority of BOs before starting suspend sequence */
>>> + r = amdgpu_device_evict_resources(adev);
>>> + if (r)
>>> + return r;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> /**
>>> * amdgpu_device_suspend - initiate device suspend
>>> *
>>> @@ -4279,7 +4304,6 @@ int amdgpu_device_suspend(struct drm_device
>>> *dev, bool fbcon)
>>> adev->in_suspend = true;
>>> - /* Evict the majority of BOs before grabbing the full access */
>>> r = amdgpu_device_evict_resources(adev);
>>> if (r)
>>> return r;
>>
>> I would just completely drop this extra
>> amdgpu_device_evict_resources() call now.
>>
>> We have a second call which is used to evacuate firmware etc... after
>> the hw has been shut down. That one can't move, but also shouldn't
>> allocate that much memory.
>>
>
> The problem is that amdgpu_device_suspend() is also called from
> amdgpu_switcheroo_set_state() as well as a bunch of pmops sequences
> that I don't expect call prepare() like poweroff().
>
> I would think we still want to evict resources at the beginning of
> amdgpu_device_suspend() for all of those.
>
> So it's an extra call for the prepare() path but it should be harmless.
That's true, but I would rather say that we should then call
amdgpu_device_prepare() in those call paths as well.
We might end up putting more stuff into the prepare call than just
eviction VRAM.
Regards,
Christian.
>
>> Regards,
>> Christian.
>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> index e3471293846f..175167582db0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>> @@ -2425,8 +2425,9 @@ static int amdgpu_pmops_prepare(struct device
>>> *dev)
>>> /* Return a positive number here so
>>> * DPM_FLAG_SMART_SUSPEND works properly
>>> */
>>> - if (amdgpu_device_supports_boco(drm_dev))
>>> - return pm_runtime_suspended(dev);
>>> + if (amdgpu_device_supports_boco(drm_dev) &&
>>> + pm_runtime_suspended(dev))
>>> + return 1;
>>> /* if we will not support s3 or s2i for the device
>>> * then skip suspend
>>> @@ -2435,7 +2436,7 @@ static int amdgpu_pmops_prepare(struct device
>>> *dev)
>>> !amdgpu_acpi_is_s3_active(adev))
>>> return 1;
>>> - return 0;
>>> + return amdgpu_device_prepare(drm_dev);
>>> }
>>> static void amdgpu_pmops_complete(struct device *dev)
>>
>
More information about the amd-gfx
mailing list