[PATCH] drm/ttm: Fix TTM page pool accounting
Christian König
christian.koenig at amd.com
Wed Mar 3 07:30:35 UTC 2021
Am 03.03.21 um 02:17 schrieb Anthony DeRossi:
> Freed pages are not subtracted from the allocated_pages counter in
> ttm_pool_type_fini(), causing a leak in the count on device removal.
> The next shrinker invocation loops forever trying to free pages that are
> no longer in the pool:
>
> rcu: INFO: rcu_sched self-detected stall on CPU
> rcu: 3-....: (9998 ticks this GP) idle=54e/1/0x4000000000000000 softirq=434857/434857 fqs=2237
> (t=10001 jiffies g=2194533 q=49211)
> NMI backtrace for cpu 3
> CPU: 3 PID: 1034 Comm: kswapd0 Tainted: P O 5.11.0-com #1
> Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 1405 11/19/2019
> Call Trace:
> <IRQ>
> ...
> </IRQ>
> sysvec_apic_timer_interrupt+0x77/0x80
> asm_sysvec_apic_timer_interrupt+0x12/0x20
> RIP: 0010:mutex_unlock+0x16/0x20
> Code: e7 48 8b 70 10 e8 7a 53 77 ff eb aa e8 43 6c ff ff 0f 1f 00 65 48 8b 14 25 00 6d 01 00 31 c9 48 89 d0 f0 48 0f b1 0f 48 39 c2 <74> 05 e9 e3 fe ff ff c3 66 90 48 8b 47 20 48 85 c0 74 0f 8b 50 10
> RSP: 0018:ffffbdb840797be8 EFLAGS: 00000246
> RAX: ffff9ff445a41c00 RBX: ffffffffc02a9ef8 RCX: 0000000000000000
> RDX: ffff9ff445a41c00 RSI: ffffbdb840797c78 RDI: ffffffffc02a9ac0
> RBP: 0000000000000080 R08: 0000000000000000 R09: ffffbdb840797c80
> R10: 0000000000000000 R11: fffffffffffffff5 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000084 R15: ffffffffc02a9a60
> ttm_pool_shrink+0x7d/0x90 [ttm]
> ttm_pool_shrinker_scan+0x5/0x20 [ttm]
> do_shrink_slab+0x13a/0x1a0
> shrink_slab+0xcb/0x2f0
> shrink_node+0x2c8/0x6c0
> balance_pgdat+0x2f7/0x580
> kswapd+0x17c/0x300
> ? wait_woken+0x70/0x70
> ? balance_pgdat+0x580/0x580
> kthread+0xff/0x140
> ? kthread_park+0x80/0x80
> ret_from_fork+0x22/0x30
>
> debugfs shows the incorrect total:
>
> $ cat /sys/kernel/debug/dri/0/ttm_page_pool
> --- 0--- --- 1--- --- 2--- --- 3--- --- 4--- --- 5--- --- 6--- --- 7--- --- 8--- --- 9--- ---10---
> wc : 0 0 0 0 0 0 0 0 0 0 0
> uc : 0 0 0 0 0 0 0 0 0 0 0
> wc 32 : 0 0 0 0 0 0 0 0 0 0 0
> uc 32 : 0 0 0 0 0 0 0 0 0 0 0
> DMA uc : 0 0 0 0 0 0 0 0 0 0 0
> DMA wc : 0 0 0 0 0 0 0 0 0 0 0
> DMA : 0 0 0 0 0 0 0 0 0 0 0
>
> total : 3029 of 8244261
>
> Using ttm_pool_type_take() to remove pages from the pool before freeing
> them correctly accounts for the freed pages.
>
> Fixes: d099fc8f540a ("drm/ttm: new TT backend allocation pool v3")
> Signed-off-by: Anthony DeRossi <ajderossi at gmail.com>
Good catch. Patch is Reviewed-by: Christian König <christian.koenig at amd.com>
Going to push that to drm-misc-fixes for upstreaming in a minute.
Thanks,
Christian.
> ---
> drivers/gpu/drm/ttm/ttm_pool.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
> index 74bf1c84b637..f6df79034805 100644
> --- a/drivers/gpu/drm/ttm/ttm_pool.c
> +++ b/drivers/gpu/drm/ttm/ttm_pool.c
> @@ -258,13 +258,13 @@ static void ttm_pool_type_init(struct ttm_pool_type *pt, struct ttm_pool *pool,
> /* Remove a pool_type from the global shrinker list and free all pages */
> static void ttm_pool_type_fini(struct ttm_pool_type *pt)
> {
> - struct page *p, *tmp;
> + struct page *p;
>
> mutex_lock(&shrinker_lock);
> list_del(&pt->shrinker_list);
> mutex_unlock(&shrinker_lock);
>
> - list_for_each_entry_safe(p, tmp, &pt->pages, lru)
> + while ((p = ttm_pool_type_take(pt)))
> ttm_pool_free_page(pt->pool, pt->caching, pt->order, p);
> }
>
More information about the dri-devel
mailing list