[PATCH] drm/ttm: Fix TTM page pool accounting

Christian König christian.koenig at amd.com
Wed Mar 3 07:30:35 UTC 2021


Am 03.03.21 um 02:17 schrieb Anthony DeRossi:
> Freed pages are not subtracted from the allocated_pages counter in
> ttm_pool_type_fini(), causing a leak in the count on device removal.
> The next shrinker invocation loops forever trying to free pages that are
> no longer in the pool:
>
>    rcu: INFO: rcu_sched self-detected stall on CPU
>    rcu:  3-....: (9998 ticks this GP) idle=54e/1/0x4000000000000000 softirq=434857/434857 fqs=2237
>      (t=10001 jiffies g=2194533 q=49211)
>    NMI backtrace for cpu 3
>    CPU: 3 PID: 1034 Comm: kswapd0 Tainted: P           O      5.11.0-com #1
>    Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 1405 11/19/2019
>    Call Trace:
>     <IRQ>
>     ...
>     </IRQ>
>     sysvec_apic_timer_interrupt+0x77/0x80
>     asm_sysvec_apic_timer_interrupt+0x12/0x20
>    RIP: 0010:mutex_unlock+0x16/0x20
>    Code: e7 48 8b 70 10 e8 7a 53 77 ff eb aa e8 43 6c ff ff 0f 1f 00 65 48 8b 14 25 00 6d 01 00 31 c9 48 89 d0 f0 48 0f b1 0f 48 39 c2 <74> 05 e9 e3 fe ff ff c3 66 90 48 8b 47 20 48 85 c0 74 0f 8b 50 10
>    RSP: 0018:ffffbdb840797be8 EFLAGS: 00000246
>    RAX: ffff9ff445a41c00 RBX: ffffffffc02a9ef8 RCX: 0000000000000000
>    RDX: ffff9ff445a41c00 RSI: ffffbdb840797c78 RDI: ffffffffc02a9ac0
>    RBP: 0000000000000080 R08: 0000000000000000 R09: ffffbdb840797c80
>    R10: 0000000000000000 R11: fffffffffffffff5 R12: 0000000000000000
>    R13: 0000000000000000 R14: 0000000000000084 R15: ffffffffc02a9a60
>     ttm_pool_shrink+0x7d/0x90 [ttm]
>     ttm_pool_shrinker_scan+0x5/0x20 [ttm]
>     do_shrink_slab+0x13a/0x1a0
>     shrink_slab+0xcb/0x2f0
>     shrink_node+0x2c8/0x6c0
>     balance_pgdat+0x2f7/0x580
>     kswapd+0x17c/0x300
>     ? wait_woken+0x70/0x70
>     ? balance_pgdat+0x580/0x580
>     kthread+0xff/0x140
>     ? kthread_park+0x80/0x80
>     ret_from_fork+0x22/0x30
>
> debugfs shows the incorrect total:
>
>    $ cat /sys/kernel/debug/dri/0/ttm_page_pool
>              --- 0--- --- 1--- --- 2--- --- 3--- --- 4--- --- 5--- --- 6--- --- 7--- --- 8--- --- 9--- ---10---
>    wc      :        0        0        0        0        0        0        0        0        0        0        0
>    uc      :        0        0        0        0        0        0        0        0        0        0        0
>    wc 32   :        0        0        0        0        0        0        0        0        0        0        0
>    uc 32   :        0        0        0        0        0        0        0        0        0        0        0
>    DMA uc  :        0        0        0        0        0        0        0        0        0        0        0
>    DMA wc  :        0        0        0        0        0        0        0        0        0        0        0
>    DMA     :        0        0        0        0        0        0        0        0        0        0        0
>
>    total   :     3029 of  8244261
>
> Using ttm_pool_type_take() to remove pages from the pool before freeing
> them correctly accounts for the freed pages.
>
> Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
> Signed-off-by: Anthony DeRossi <ajderossi at gmail.com>

Good catch. Patch is Reviewed-by: Christian König <christian.koenig at amd.com>

Going to push that to drm-misc-fixes for upstreaming in a minute.

Thanks,
Christian.

> ---
>   drivers/gpu/drm/ttm/ttm_pool.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index 74bf1c84b637..f6df79034805 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -258,13 +258,13 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool,
>   /* Remove a pool_type from the global shrinker list and free all pages */
>   static void ttm_pool_type_fini(struct ttm_pool_type *pt)
>   {
> -	struct page *p, *tmp;
> +	struct page *p;
>   
>   	mutex_lock(&shrinker_lock);
>   	list_del(&pt->shrinker_list);
>   	mutex_unlock(&shrinker_lock);
>   
> -	list_for_each_entry_safe(p, tmp, &pt->pages, lru)
> +	while ((p = ttm_pool_type_take(pt)))
>   		ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
>   }
>   



More information about the dri-devel mailing list