[RFC 2/2] drm/amd: Use suspend and hibernate post freeze notifications
Rafael J. Wysocki
rafael at kernel.org
Wed May 7 19:39:05 UTC 2025
On Wed, May 7, 2025 at 9:17 PM Mario Limonciello <superm1 at kernel.org> wrote:
>
> On 5/7/2025 2:14 PM, Rafael J. Wysocki wrote:
> > On Thu, May 1, 2025 at 11:17 PM Mario Limonciello <superm1 at kernel.org> wrote:
> >>
> >> From: Mario Limonciello <mario.limonciello at amd.com>
> >>
> >> commit 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification
> >> callback support") introduced a VRAM eviction earlier in the PM
> >> sequences when swap was still available for evicting to. This helped
> >> to fix a number of memory pressure related bugs but also exposed a
> >> new one.
> >>
> >> If a userspace process is actively using the GPU when suspend starts
> >> then a deadlock could occur.
> >>
> >> Instead of going off the prepare notifier, use the PM notifiers that
> >> occur after processes have been frozen to do evictions.
> >>
> >> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178
> >> Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support")
> >> Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
> >> ---
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> index 7f354cd532dc1..cad311b9fd834 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >> @@ -4917,10 +4917,10 @@ static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mo
> >> int r;
> >>
> >> switch (mode) {
> >> - case PM_HIBERNATION_PREPARE:
> >> + case PM_HIBERNATION_POST_FREEZE:
> >> adev->in_s4 = true;
> >> fallthrough;
> >> - case PM_SUSPEND_PREPARE:
> >> + case PM_SUSPEND_POST_FREEZE:
> >> r = amdgpu_device_evict_resources(adev);
> >> /*
> >> * This is considered non-fatal at this time because
> >> --
> >
> > Why do you need a notifier for this?
> >
> > It looks like this could be done from amdgpu_device_prepare(), but if
> > there is a reason why it cannot be done from there, it should be
> > mentioned in the changelog.
>
> It's actually done in amdgpu_device_prepare() "as well" already, but the
> reason that it's being done earlier is because swap still needs to be
> available, especially with heavy memory fragmentation.
Swap should be still available when amdgpu_device_prepare() runs.
> I'll add more detail about this to the commit for the next spin if
> you're relatively happy with the new notifier from the first patch.
I need to have a look at it, but adding it for just one user seems a
bit over the top. I'd prefer to avoid doing this.
More information about the amd-gfx
mailing list