[PATCH] drm/ttm: Make pool shrinker more responsive
Christian König
christian.koenig at amd.com
Fri May 16 08:23:30 UTC 2025
On 5/15/25 22:57, Tvrtko Ursulin wrote:
> Currently the TTM pool shrinker ensures it frees at least something every
> time it is invoked, but it also lies to the core a bit on how hard it
> tried.
>
> For example core will ask it to free SHRINK_BATCH pages but the shrinker
> can, due how it walks the LRU list of pools, free just a single page and
> still leave the core thinking it expended the full SHRINK_BATCH effort.
>
> Apart from being inefficient in terms of number of calls to the TTM pool
> shrinker required, another consquence of this is that the core can abandon
> the shrinking attempt too early due thinking that the whole set of
> freeable pages has been scanned.
>
> We fix this last part by correctly reporting how many out of potentially
> freeable pages have been scanned.
>
> We also do the freeing in an aggressive manner, considering the scan
> target as a free target, to ensure walks over pools with unfreeable pages
> do not cause no-op calls into our callback.
>
> And finally we customise the shrinker batch size based on the median pool
> order and the total number of pools, with the aim of searching more
> possible pools on an average invocation.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at igalia.com>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> ---
> drivers/gpu/drm/ttm/ttm_pool.c | 39 +++++++++++++++++++++-------------
> 1 file changed, 24 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index c2ea865be657..a76fe5f95781 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -368,7 +368,7 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
> }
>
> /* Free pages using the global shrinker list */
> -static unsigned int ttm_pool_shrink(void)
> +static unsigned int ttm_pool_shrink(unsigned long *nr_scanned)
> {
> struct ttm_pool_type *pt;
> unsigned int num_pages;
> @@ -380,16 +380,15 @@ static unsigned int ttm_pool_shrink(void)
> list_move_tail(&pt->shrinker_list, &shrinker_list);
> spin_unlock(&shrinker_lock);
>
> + num_pages = 1 << pt->order;
> + (*nr_scanned) += num_pages;
> +
> p = ttm_pool_type_take(pt);
> - if (p) {
> + if (p)
> ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
> - num_pages = 1 << pt->order;
> - } else {
> - num_pages = 0;
> - }
> up_read(&pool_shrink_rwsem);
>
> - return num_pages;
> + return p ? num_pages : 0;
> }
That change here doesn't make any sense.
Scanning a pool type for pages and not finding anything doesn't mean we have scanned for freeable pages and then not freed them.
>
> /* Return the allocation order based for a page */
> @@ -881,10 +880,12 @@ int ttm_pool_restore_and_alloc(struct ttm_pool *pool, struct ttm_tt *tt,
> */
> void ttm_pool_free(struct ttm_pool *pool, struct ttm_tt *tt)
> {
> + unsigned long nr_scanned = 0;
> +
> ttm_pool_free_range(pool, tt, tt->caching, 0, tt->num_pages);
>
> while (atomic_long_read(&allocated_pages) > page_pool_size)
> - ttm_pool_shrink();
> + ttm_pool_shrink(&nr_scanned);
> }
> EXPORT_SYMBOL(ttm_pool_free);
>
> @@ -1132,17 +1133,21 @@ void ttm_pool_fini(struct ttm_pool *pool)
> }
> EXPORT_SYMBOL(ttm_pool_fini);
>
> -/* As long as pages are available make sure to release at least one */
> static unsigned long ttm_pool_shrinker_scan(struct shrinker *shrink,
> struct shrink_control *sc)
> {
> - unsigned long num_freed = 0;
> + unsigned long to_scan, freed = 0;
>
> - do
> - num_freed += ttm_pool_shrink();
> - while (!num_freed && atomic_long_read(&allocated_pages));
> + sc->nr_scanned = 0;
> + to_scan = min_t(unsigned long,
> + sc->nr_to_scan, atomic_long_read(&allocated_pages));
> + while (freed < to_scan) {
> + freed += ttm_pool_shrink(&sc->nr_scanned);
> + to_scan = min_t(unsigned long,
> + to_scan, atomic_long_read(&allocated_pages));
> + }
>
> - return num_freed;
> + return sc->nr_scanned ? freed : SHRINK_STOP;
That again doesn't make sense. That we only find pool types which don't have pages doesn't mean we have scanned them.
As far as I can see the existing code was correct after all.
> }
>
> /* Return the number of pages available or SHRINK_EMPTY if we have none */
> @@ -1266,7 +1271,10 @@ EXPORT_SYMBOL(ttm_pool_debugfs);
> /* Test the shrinker functions and dump the result */
> static int ttm_pool_debugfs_shrink_show(struct seq_file *m, void *data)
> {
> - struct shrink_control sc = { .gfp_mask = GFP_NOFS };
> + struct shrink_control sc = {
> + .gfp_mask = GFP_NOFS,
> + .nr_to_scan = 1,
> + };
>
> fs_reclaim_acquire(GFP_KERNEL);
> seq_printf(m, "%lu/%lu\n", ttm_pool_shrinker_count(mm_shrinker, &sc),
> @@ -1324,6 +1332,7 @@ int ttm_pool_mgr_init(unsigned long num_pages)
>
> mm_shrinker->count_objects = ttm_pool_shrinker_count;
> mm_shrinker->scan_objects = ttm_pool_shrinker_scan;
> + mm_shrinker->batch = (1 << (MAX_PAGE_ORDER / 2)) * NR_PAGE_ORDERS;
Since we install only one global shrinker for all pool types, which might contain everything from 1 page till 512 pages, this seems to not make sense at all either.
What exactly are you trying to do here?
Regards,
Christian.
> mm_shrinker->seeks = 1;
>
> shrinker_register(mm_shrinker);
More information about the dri-devel
mailing list