[BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()

Steven Rostedt rostedt at goodmis.org
Wed Mar 15 15:57:12 UTC 2023


On Wed, 15 Mar 2023 16:25:11 +0100
Christian König <ckoenig.leichtzumerken at gmail.com> wrote:
> >>
> >> Thanks for the notice,  
> > I'm still getting this on Linus's latest tree.  
> 
> This must be some reference counting issue which only happens in your 
> particular use case. We have tested this quite extensively and couldn't 
> reproduce it so far.

Have you tried 32 bit with my config. I also sent a link to your previous
email that gives access to the VM image I'm using that is triggering this
issue.

Here it is again:

  The libvirt xml file is here: https://rostedt.org/vm-images/tracetest-32.xml
  and the VM image itself is here: https://rostedt.org/vm-images/tracetest-32.qcow2.bz2

> 
> Can you apply this code snippet here and see if you get any warning in 
> the system logs?
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 459f1b4440da..efc390bfd69c 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -314,6 +314,7 @@ static void ttm_bo_delayed_delete(struct work_struct 
> *work)
>          dma_resv_lock(bo->base.resv, NULL);
>          ttm_bo_cleanup_memtype_use(bo);
>          dma_resv_unlock(bo->base.resv);
> +       bo->delayed_delete.func = NULL;
>          ttm_bo_put(bo);
>   }
> 
> @@ -327,6 +328,8 @@ static void ttm_bo_release(struct kref *kref)
>          WARN_ON_ONCE(bo->pin_count);
>          WARN_ON_ONCE(bo->bulk_move);
> 
> +       WARN_ON(bo->delayed_delete.func != NULL);
> +
>          if (!bo->deleted) {
>                  ret = ttm_bo_individualize_resv(bo);
>                  if (ret) {
> 

The WARN_ON triggered:

[   21.481449] mpls_gso: MPLS GSO support
[   21.488795] IPI shorthand broadcast: enabled
[   21.488873] ------------[ cut here ]------------
[   21.490101] ------------[ cut here ]------------

[   21.491693] WARNING: CPU: 1 PID: 38 at drivers/gpu/drm/ttm/ttm_bo.c:332 ttm_bo_release+0x2ac/0x2fc  <<<---- Line of the added WARN_ON()

[   21.492940] refcount_t: underflow; use-after-free.
[   21.492965] WARNING: CPU: 0 PID: 84 at lib/refcount.c:28 refcount_warn_saturate+0xb6/0xfc
[   21.496116] Modules linked in:
[   21.497197] Modules linked in:
[   21.500105] CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #993
[   21.500789] CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #993
[   21.501882] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[   21.503533] sched_clock: Marking stable (20788024762, 714243692)->(22140778105, -638509651)
[   21.504080] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[   21.504089] Workqueue: ttm ttm_bo_delayed_delete
[   21.507196] Workqueue: events drm_fb_helper_damage_work
[   21.509235] 
[   21.510291] registered taskstats version 1
[   21.510302] Running ring buffer tests...
[   21.511792] 
[   21.513870] EIP: refcount_warn_saturate+0xb6/0xfc
[   21.515261] EIP: ttm_bo_release+0x2ac/0x2fc
[   21.516566] Code: 68 00 27 0c d8 e8 36 3b aa ff 0f 0b 58 c9 c3 90 80 3d 41 c2 37 d8 00 75 8a c6 05 41 c2 37 d8 01 68 2c 27 0c d8 e8 16 3b aa ff <0f> 0b 59 c9 c3 80 3d 3f c2 37 d8 00 0f 85 67 ff ff ff c6 05 3f c2
[   21.516998] Code: ff 8d b4 26 00 00 00 00 66 90 0f 0b 8b 43 10 85 c0 0f 84 a1 fd ff ff 8d 76 00 0f 0b 8b 43 28 85 c0 0f 84 9c fd ff ff 8d 76 00 <0f> 0b e9 92 fd ff ff 8d b4 26 00 00 00 00 66 90 c7 43 18 00 00 00
[   21.517905] EAX: 00000026 EBX: c129d150 ECX: 00000040 EDX: 00000002
[   21.518987] EAX: d78c8550 EBX: c129d134 ECX: c129d134 EDX: 00000001
[   21.519337] ESI: c129d0bc EDI: f6f91200 EBP: c2b8bf18 ESP: c2b8bf14
[   21.520617] ESI: c129d000 EDI: c126a7a0 EBP: c1839c24 ESP: c1839bec
[   21.521546] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010286
[   21.526154] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010286
[   21.526162] CR0: 80050033 CR2: 00000000 CR3: 18506000 CR4: 00150ef0
[   21.526166] Call Trace:
[   21.526189]  ? ww_mutex_unlock+0x3a/0x94
[   21.530300] CR0: 80050033 CR2: ff9ff000 CR3: 18506000 CR4: 00150ef0
[   21.531722]  ? ttm_bo_cleanup_refs+0xc4/0x1e0
[   21.533114] Call Trace:
[   21.534516]  ttm_mem_evict_first+0x3d3/0x568
[   21.535901]  ttm_bo_delayed_delete+0x9c/0xa4
[   21.537391]  ? kfree+0x6b/0xdc
[   21.538901]  process_one_work+0x21a/0x484
[   21.540279]  ? ttm_range_man_alloc+0xe0/0xec
[   21.540854]  worker_thread+0x14a/0x39c
[   21.541714]  ? ttm_range_man_fini_nocheck+0xe8/0xe8
[   21.543332]  kthread+0xea/0x10c
[   21.544301]  ttm_bo_mem_space+0x1d0/0x1e4
[   21.544942]  ? process_one_work+0x484/0x484
[   21.545887]  ttm_bo_validate+0xc5/0x19c
[   21.546986]  ? kthread_complete_and_exit+0x1c/0x1c
[   21.547680]  ttm_bo_init_reserved+0x15e/0x1fc
[   21.548716]  ret_from_fork+0x1c/0x28
[   21.549650]  qxl_bo_create+0x145/0x20c

Note, this is all on boot up before user space is running.

-- Steve


More information about the amd-gfx mailing list