[Intel-gfx] [PATCH 1/2] drm/i915/gem: Avoid rcu_barrier() from shrinker paths

Mon Dec 9 10:47:23 UTC 2019

On Sun, 8 Dec 2019 at 16:13, Chris Wilson <chris at chris-wilson.co.uk> wrote:
>
> As i915_gem_object_unbind() waits on an rcu_barrier() to flush vm
> releases (and destruction of their bound vma), we have to be careful not
> to invoke that barrier from beneath the shrinker:
>
> <4> [430.222671] WARNING: possible circular locking dependency detected
> <4> [430.222673] 5.4.0-rc8-CI-CI_DRM_7508+ #1 Tainted: G     U
> <4> [430.222675] ------------------------------------------------------
> <4> [430.222677] gem_pwrite/2317 is trying to acquire lock:
> <4> [430.222678] ffffffff82248218 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
> <4> [430.222685]
> but task is already holding lock:
> <4> [430.222687] ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
> <4> [430.222691]
> which lock already depends on the new lock.
>
> <4> [430.222693]
> the existing dependency chain (in reverse order) is:
> <4> [430.222695]
> -> #2 (fs_reclaim){+.+.}:
> <4> [430.222698]        fs_reclaim_acquire.part.117+0x24/0x30
> <4> [430.222702]        kmem_cache_alloc_trace+0x2a/0x2c0
> <4> [430.222705]        intel_cpuc_prepare+0x37/0x1a0
> <4> [430.222709]        cpuhp_invoke_callback+0x9b/0x9d0
> <4> [430.222712]        _cpu_up+0xa2/0x140
> <4> [430.222714]        do_cpu_up+0x61/0xa0
> <4> [430.222718]        smp_init+0x57/0x96
> <4> [430.222722]        kernel_init_freeable+0xac/0x1c7
> <4> [430.222725]        kernel_init+0x5/0x100
> <4> [430.222728]        ret_from_fork+0x24/0x50
> <4> [430.222729]
> -> #1 (cpu_hotplug_lock.rw_sem){++++}:
> <4> [430.222733]        cpus_read_lock+0x34/0xd0
> <4> [430.222734]        rcu_barrier+0xaa/0x190
> <4> [430.222736]        kernel_init+0x21/0x100
> <4> [430.222737]        ret_from_fork+0x24/0x50
> <4> [430.222739]
> -> #0 (rcu_state.barrier_mutex){+.+.}:
> <4> [430.222742]        __lock_acquire+0x1328/0x15d0
> <4> [430.222743]        lock_acquire+0xa7/0x1c0
> <4> [430.222746]        __mutex_lock+0x9a/0x9d0
> <4> [430.222747]        rcu_barrier+0x23/0x190
> <4> [430.222850]        i915_gem_object_unbind+0x264/0x3d0 [i915]
> <4> [430.222882]        i915_gem_shrink+0x297/0x5f0 [i915]
> <4> [430.222912]        i915_gem_shrink_all+0x38/0x60 [i915]
> <4> [430.222934]        i915_drop_caches_set+0x1f0/0x240 [i915]
> <4> [430.222938]        simple_attr_write+0xb0/0xd0
> <4> [430.222941]        full_proxy_write+0x51/0x80
> <4> [430.222943]        vfs_write+0xb9/0x1d0
> <4> [430.222944]        ksys_write+0x9f/0xe0
> <4> [430.222946]        do_syscall_64+0x4f/0x210
> <4> [430.222948]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> <4> [430.222950]
> other info that might help us debug this:
>
> <4> [430.222952] Chain exists of:
>   rcu_state.barrier_mutex --> cpu_hotplug_lock.rw_sem --> fs_reclaim
>
> <4> [430.222955]  Possible unsafe locking scenario:
>
> <4> [430.222957]        CPU0                    CPU1
> <4> [430.222958]        ----                    ----
> <4> [430.222960]   lock(fs_reclaim);
> <4> [430.222961]                                lock(cpu_hotplug_lock.rw_sem);
> <4> [430.222963]                                lock(fs_reclaim);
> <4> [430.222964]   lock(rcu_state.barrier_mutex);
> <4> [430.222966]
>  *** DEADLOCK ***
>
> <4> [430.222968] 3 locks held by gem_pwrite/2317:
> <4> [430.222969]  #0: ffff88849e2d9408 (sb_writers#14){.+.+}, at: vfs_write+0x1a4/0x1d0
> <4> [430.222973]  #1: ffff888496976db0 (&attr->mutex){+.+.}, at: simple_attr_write+0x36/0xd0
> <4> [430.222976]  #2: ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
> <4> [430.222980]
> stack backtrace:
> <4> [430.222982] CPU: 1 PID: 2317 Comm: gem_pwrite Tainted: G     U            5.4.0-rc8-CI-CI_DRM_7508+ #1
> <4> [430.222985] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2321.A08.1909162051 09/16/2019
> <4> [430.222989] Call Trace:
> <4> [430.222992]  dump_stack+0x71/0x9b
> <4> [430.222995]  check_noncircular+0x19b/0x1c0
> <4> [430.222998]  ? __lock_acquire+0x1328/0x15d0
> <4> [430.222999]  __lock_acquire+0x1328/0x15d0
> <4> [430.223001]  ? mark_held_locks+0x49/0x70
> <4> [430.223003]  lock_acquire+0xa7/0x1c0
> <4> [430.223005]  ? rcu_barrier+0x23/0x190
> <4> [430.223008]  __mutex_lock+0x9a/0x9d0
> <4> [430.223009]  ? rcu_barrier+0x23/0x190
> <4> [430.223011]  ? rcu_barrier+0x23/0x190
> <4> [430.223013]  ? find_held_lock+0x2d/0x90
> <4> [430.223045]  ? i915_gem_object_unbind+0x24a/0x3d0 [i915]
> <4> [430.223048]  ? rcu_barrier+0x23/0x190
> <4> [430.223049]  rcu_barrier+0x23/0x190
> <4> [430.223081]  i915_gem_object_unbind+0x264/0x3d0 [i915]
> <4> [430.223119]  i915_gem_shrink+0x297/0x5f0 [i915]
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld at intel.com>