[PATCH] drm/amd: Add Suspend/Hibernate notification callback support

Alex Deucher alexdeucher at gmail.com
Fri Nov 22 14:05:32 UTC 2024


On Fri, Nov 22, 2024 at 5:03 AM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Am 18.11.24 um 21:03 schrieb Mario Limonciello:
> > As part of the suspend sequence VRAM needs to be evicted on dGPUs.
> > In order to make suspend/resume more reliable we moved this into
> > the pmops prepare() callback so that the suspend sequence would fail
> > but the system could remain operational under high memory usage suspend.
> >
> > Another class of issues exist though where due to memory fragementation
> > there isn't a large enough contiguous space and swap isn't accessible.
> >
> > Add support for a suspend/hibernate notification callback that could
> > evict VRAM before tasks are frozen. This should allow paging out to swap
> > if necessary.
> >
> > Link: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3476
> > Signed-off-by: Mario Limonciello <mario.limonciello at amd.com>
>
> I'm absolutely not an expert on that stuff but it looks rather
> reasonable to me.
>
> Alex what do you think?

Yeah, same.  Seems reasonable to me.
Acked-by: Alex Deucher <alexander.deucher at amd.com>

>
> Regards,
> Christian.
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  1 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 36 ++++++++++++++++++++++
> >   2 files changed, 37 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index a37e687acbbc5..e70ca85151046 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -885,6 +885,7 @@ struct amdgpu_device {
> >       bool                            need_swiotlb;
> >       bool                            accel_working;
> >       struct notifier_block           acpi_nb;
> > +     struct notifier_block           pm_nb;
> >       struct amdgpu_i2c_chan          *i2c_bus[AMDGPU_MAX_I2C_BUS];
> >       struct debugfs_blob_wrapper     debugfs_vbios_blob;
> >       struct debugfs_blob_wrapper     debugfs_discovery_blob;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index b3ca911e55d61..5a4e9c7daf895 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -190,6 +190,8 @@ void amdgpu_set_init_level(struct amdgpu_device *adev,
> >   }
> >
> >   static inline void amdgpu_device_stop_pending_resets(struct amdgpu_device *adev);
> > +static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode,
> > +                                  void *data);
> >
> >   /**
> >    * DOC: pcie_replay_count
> > @@ -4582,6 +4584,11 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >
> >       amdgpu_device_check_iommu_direct_map(adev);
> >
> > +     adev->pm_nb.notifier_call = amdgpu_device_pm_notifier;
> > +     r = register_pm_notifier(&adev->pm_nb);
> > +     if (r)
> > +             goto failed;
> > +
> >       return 0;
> >
> >   release_ras_con:
> > @@ -4646,6 +4653,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
> >               drain_workqueue(adev->mman.bdev.wq);
> >       adev->shutdown = true;
> >
> > +     unregister_pm_notifier(&adev->pm_nb);
> > +
> >       /* make sure IB test finished before entering exclusive mode
> >        * to avoid preemption on IB test
> >        */
> > @@ -4777,6 +4786,33 @@ static int amdgpu_device_evict_resources(struct amdgpu_device *adev)
> >   /*
> >    * Suspend & resume.
> >    */
> > +/**
> > + * amdgpu_device_pm_notifier - Notification block for Suspend/Hibernate events
> > + * @nb: notifier block
> > + * @mode: suspend mode
> > + * @data: data
> > + *
> > + * This function is called when the system is about to suspend or hibernate.
> > + * It is used to evict resources from the device before the system goes to
> > + * sleep while there is still access to swap.
> > + *
> > + */
> > +static int amdgpu_device_pm_notifier(struct notifier_block *nb, unsigned long mode,
> > +                                  void *data)
> > +{
> > +     struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, pm_nb);
> > +
> > +     switch (mode) {
> > +     case PM_HIBERNATION_PREPARE:
> > +     case PM_SUSPEND_PREPARE:
> > +             if (amdgpu_device_evict_resources(adev))
> > +                     return NOTIFY_BAD;
> > +             break;
> > +     }
> > +
> > +     return NOTIFY_DONE;
> > +}
> > +
> >   /**
> >    * amdgpu_device_prepare - prepare for device suspend
> >    *
>


More information about the amd-gfx mailing list