[RFC 2/2] drm/amd: Use suspend and hibernate post freeze notifications

Mario Limonciello superm1 at kernel.org
Wed May 7 19:45:53 UTC 2025


On 5/7/2025 2:39 PM, Rafael J. Wysocki wrote:
> On Wed, May 7, 2025 at 9:17 PM Mario Limonciello <superm1 at kernel.org> wrote:
>>
>> On 5/7/2025 2:14 PM, Rafael J. Wysocki wrote:
>>> On Thu, May 1, 2025 at 11:17 PM Mario Limonciello <superm1 at kernel.org> wrote:
>>>>
>>>> From: Mario Limonciello <mario.limonciello at amd.com>
>>>>
>>>> commit 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification
>>>> callback support") introduced a VRAM eviction earlier in the PM
>>>> sequences when swap was still available for evicting to. This helped
>>>> to fix a number of memory pressure related bugs but also exposed a
>>>> new one.
>>>>
>>>> If a userspace process is actively using the GPU when suspend starts
>>>> then a deadlock could occur.
>>>>
>>>> Instead of going off the prepare notifier, use the PM notifiers that
>>>> occur after processes have been frozen to do evictions.
>>>>
>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178
>>>> Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support")
>>>> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
>>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 7f354cd532dc1..cad311b9fd834 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -4917,10 +4917,10 @@ static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mo
>>>>           int r;
>>>>
>>>>           switch (mode) {
>>>> -       case PM_HIBERNATION_PREPARE:
>>>> +       case PM_HIBERNATION_POST_FREEZE:
>>>>                   adev->in_s4 = true;
>>>>                   fallthrough;
>>>> -       case PM_SUSPEND_PREPARE:
>>>> +       case PM_SUSPEND_POST_FREEZE:
>>>>                   r = amdgpu_device_evict_resources(adev);
>>>>                   /*
>>>>                    * This is considered non-fatal at this time because
>>>> --
>>>
>>> Why do you need a notifier for this?
>>>
>>> It looks like this could be done from amdgpu_device_prepare(), but if
>>> there is a reason why it cannot be done from there, it should be
>>> mentioned in the changelog.
>>
>> It's actually done in amdgpu_device_prepare() "as well" already, but the
>> reason that it's being done earlier is because swap still needs to be
>> available, especially with heavy memory fragmentation.
> 
> Swap should be still available when amdgpu_device_prepare() runs.

No; it's not.  The basic call trace (for suspend) looks like this:

enter_state(state) {
     suspend_prepare(state);
     ...
     pm_restrict_gfp_mask();  // disable swap
     suspend_devices_and_enter(state) → dpm_suspend_start() {
         dpm_prepare() {
             amdgpu_pmops_prepare();
         }
         dpm_suspend() {
             amdgpu_pmops_suspend();
         }
     }
}

If the intention was for it to be available, it would be better to move 
the pm_restrict_gfp_mask() call "into" suspend_devices_and_enter() 
between dpm_prepare() and dpm_suspend() calls.

> 
>> I'll add more detail about this to the commit for the next spin if
>> you're relatively happy with the new notifier from the first patch.
> 
> I need to have a look at it, but adding it for just one user seems a
> bit over the top.  I'd prefer to avoid doing this.



More information about the dri-devel mailing list