[Intel-gfx] [BUG] lockdep splat with kernfs lockdep annotations and slab mutex from drm patch??

Thu Jul 11 02:57:20 UTC 2019

On Fri, 14 Jun 2019 08:38:37 -0700
Tejun Heo <tj at kernel.org> wrote:

> Hello,
> 
> On Fri, Jun 14, 2019 at 04:08:33PM +0100, Chris Wilson wrote:
> > #ifdef CONFIG_MEMCG
> >         if (slab_state >= FULL && err >= 0 && is_root_cache(s)) {
> >                 struct kmem_cache *c;
> > 
> >                 mutex_lock(&slab_mutex);
> > 
> > so it happens to hit the error + FULL case with the additional slabcaches?
> > 
> > Anyway, according to lockdep, it is dangerous to use the slab_mutex inside
> > slab_attr_store().  
> 
> Didn't really look into the code but it looks like slab_mutex is held
> while trying to remove sysfs files.  sysfs file removal flushes
> on-going accesses, so if a file operation then tries to grab a mutex
> which is held during removal, it leads to a deadlock.
> 

Looks like this never got fixed and now this bug is in 5.2.

Just got this:

 ======================================================
 WARNING: possible circular locking dependency detected
 5.2.0-test #15 Not tainted
 ------------------------------------------------------
 slub_cpu_partia/899 is trying to acquire lock:
 000000000f6f2dd7 (slab_mutex){+.+.}, at: slab_attr_store+0x6d/0xe0

 but task is already holding lock:
 00000000b23ffe3d (kn->count#160){++++}, at: kernfs_fop_write+0x125/0x230

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (kn->count#160){++++}:
        __kernfs_remove+0x413/0x4a0
        kernfs_remove_by_name_ns+0x40/0x80
        sysfs_slab_add+0x1b5/0x2f0
        __kmem_cache_create+0x511/0x560
        create_cache+0xcd/0x1f0
        kmem_cache_create_usercopy+0x18a/0x240
        kmem_cache_create+0x12/0x20
        is_active_nid+0xdb/0x230 [snd_hda_codec_generic]
        snd_hda_get_path_idx+0x55/0x80 [snd_hda_codec_generic]
        get_nid_path+0xc/0x170 [snd_hda_codec_generic]
        do_one_initcall+0xa2/0x394
        do_init_module+0xfd/0x370
        load_module+0x38c6/0x3bd0
        __do_sys_finit_module+0x11a/0x1b0
        do_syscall_64+0x68/0x250
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

 -> #0 (slab_mutex){+.+.}:
        lock_acquire+0xbd/0x1d0
        __mutex_lock+0xfc/0xb70
        slab_attr_store+0x6d/0xe0
        kernfs_fop_write+0x170/0x230
        vfs_write+0xe1/0x240
        ksys_write+0xba/0x150
        do_syscall_64+0x68/0x250
        entry_SYSCALL_64_after_hwframe+0x49/0xbe

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(kn->count#160);
                                lock(slab_mutex);
                                lock(kn->count#160);
   lock(slab_mutex);

  *** DEADLOCK ***

Attached is a config and the full dmesg.

-- Steve

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg
Type: application/octet-stream
Size: 94338 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20190710/87bfa4a7/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config
Type: application/octet-stream
Size: 131477 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20190710/87bfa4a7/attachment-0003.obj>