[PATCH 4/7] drm/ttm: move LRU walk defines into new internal header

Wed Aug 28 15:35:33 UTC 2024

On Wed, Aug 28, 2024 at 11:26 AM Christian König
<christian.koenig at amd.com> wrote:
>
> Am 28.08.24 um 16:05 schrieb Thomas Hellström:
> > On Wed, 2024-08-28 at 14:20 +0200, Christian König wrote:
> >> Am 27.08.24 um 19:53 schrieb Daniel Vetter:
> >>> On Tue, Aug 27, 2024 at 06:52:13PM +0200, Daniel Vetter wrote:
> >>>> On Thu, Aug 22, 2024 at 03:19:29PM +0200, Christian König wrote:
> >>>>> Completely agree that this is complicated, but I still don't
> >>>>> see the need
> >>>>> for it.
> >>>>>
> >>>>> Drivers just need to use pm_runtime_get_if_in_use() inside the
> >>>>> shrinker and
> >>>>> postpone all hw activity until resume.
> >>>> Not good enough, at least long term I think. Also postponing hw
> >>>> activity
> >>>> to resume doesn't solve the deadlock issue, if you still need to
> >>>> grab ttm
> >>>> locks on resume.
> >>> Pondered this specific aspect some more, and I think you still have
> >>> a race
> >>> here (even if you avoid the deadlock): If the condiditional rpm_get
> >>> call
> >>> fails there's no guarantee that the device will suspend/resume and
> >>> clean
> >>> up the GART mapping.
> >> Well I think we have a major disconnect here. When the device is
> >> powered
> >> down there is no GART mapping to clean up any more.
> >>
> >> In other words GART is a table in local memory (VRAM) when the device
> >> is
> >> powered down this table is completely destroyed. Any BO which was
> >> mapped
> >> inside this table is now not mapped any more.
> >>
> >> So when the shrinker wants to evict a BO which is marked as mapped to
> >> GART and the device is powered down we just skip the GART unmapping
> >> part
> >> because that has already implicitly happened during power down.
> >>
> >> Before mapping any BO into the GART again we power the GPU up through
> >> the runtime PM calls. And while powering it up again the GART is
> >> restored.
> > I think you're forgetting the main Xe use-case of Lunar-lake
> > compression metadata. I'ts retained by the device during D3hot, but
> > cannot, at that time, be accessed for shrinking.
>
> Yeah, that is really something we don't have an equivalent for on AMD GPUs.
>
> When the ASIC is powered down VRAM is basically dead as well because it
> won't get refreshed any more.

We actually support memory self refresh for VRAM on some platforms,
but the rest of the GPU is powered down, so only the VRAM contents is
retained.

Alex

>
> > And copying it all out "Just in case" when transitioning to D3hot just
> > isn't a viable solution.
>
> I would say that this is solvable with a hierarchy of power management
> functionality.
>
> E.g. the runtime PM interface works the same for you as it does for
> amdgpu with evicting TTM BOs etc....
>
> Then separate from runtime PM you have a reference count for the
> accessibility of compressed metadata. And while shrinking you only
> resume this specific part.
>
> Christian.
>
> >
> > /Thomas
> >
>