list_lru operation for new child memcg?

Wed May 28 23:20:50 UTC 2025

On Thu, May 29, 2025 at 07:53:55AM +1000, Dave Airlie wrote:
> On Wed, 28 May 2025 at 17:20, Christian König <christian.koenig at amd.com> wrote:
> >
> > Hi guys,
> >
> > On 5/27/25 01:49, Dave Chinner wrote:
> > > I disagree - specifically ordered memcg traversal is not something
> > > that the list_lru implementation is currently doing, nor should it
> > > be doing.
> >
> > I realized over night that I didn't fully explored a way of getting both advantages. And we actually don't need list_lru for that.
> >
> > So here is a side question:
> >
> > Is it possible to just have a per cgroup counter on how many pages a cgroup released back to a particular pool? E.g. something which is added up to the same counter on the parent when a cgroup is released.
> >
> > Background is that the pages are not distinguishable from each other, e.g. they are not cache hot or cold or anything like this. So it doesn't matter which pages a cgroup has released but only how many.
> >
> > If it would be possible to get such a counter then it would be like just a few lines of code to add the isolation and still get the advantage of sharing released pages between different cgroups.
> 
> I think numa is the only possible distinction I can see between pages
> here, even uncached GPU access will be slower to further away numa
> nodes,
> 
> But indeed this might be a workable idea, just make something that
> does what list_lru does but just for the counters, and keep the pages
> in a single pool.

If you only want NUMA aware LRU + reclaim/reuse without memcg
awareness, list_lru supports that configuration. Use list_lru_init()
for numa-aware LRU infrastructure, list_lru_init_memcg() should only
be used if need memcg awareness in the LRU.

THere are various caches that use this config e.g. the XFS buffer
cache and dquot caches because they are global caches whose contents
is shared across all cgroups. The shrinker associated with them is
configured only as SHRINKER_NUMA_AWARE so that reclaim is done
per-node rather than as a single global LRU....

-Dave.
-- 
Dave Chinner
david at fromorbit.com