[Intel-gfx] [PATCH] drm/i915/gem: Avoid rcu_barrier() from shrinker paths
Chris Wilson
chris at chris-wilson.co.uk
Sun Dec 8 00:12:09 UTC 2019
As i915_gem_object_unbind() waits on an rcu_barrier() to flush vm
releases (and destruction of their bound vma), we have to be careful not
to invoke that barrier from beneath the shrinker:
<4> [430.222671] WARNING: possible circular locking dependency detected
<4> [430.222673] 5.4.0-rc8-CI-CI_DRM_7508+ #1 Tainted: G U
<4> [430.222675] ------------------------------------------------------
<4> [430.222677] gem_pwrite/2317 is trying to acquire lock:
<4> [430.222678] ffffffff82248218 (rcu_state.barrier_mutex){+.+.}, at: rcu_barrier+0x23/0x190
<4> [430.222685]
but task is already holding lock:
<4> [430.222687] ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
<4> [430.222691]
which lock already depends on the new lock.
<4> [430.222693]
the existing dependency chain (in reverse order) is:
<4> [430.222695]
-> #2 (fs_reclaim){+.+.}:
<4> [430.222698] fs_reclaim_acquire.part.117+0x24/0x30
<4> [430.222702] kmem_cache_alloc_trace+0x2a/0x2c0
<4> [430.222705] intel_cpuc_prepare+0x37/0x1a0
<4> [430.222709] cpuhp_invoke_callback+0x9b/0x9d0
<4> [430.222712] _cpu_up+0xa2/0x140
<4> [430.222714] do_cpu_up+0x61/0xa0
<4> [430.222718] smp_init+0x57/0x96
<4> [430.222722] kernel_init_freeable+0xac/0x1c7
<4> [430.222725] kernel_init+0x5/0x100
<4> [430.222728] ret_from_fork+0x24/0x50
<4> [430.222729]
-> #1 (cpu_hotplug_lock.rw_sem){++++}:
<4> [430.222733] cpus_read_lock+0x34/0xd0
<4> [430.222734] rcu_barrier+0xaa/0x190
<4> [430.222736] kernel_init+0x21/0x100
<4> [430.222737] ret_from_fork+0x24/0x50
<4> [430.222739]
-> #0 (rcu_state.barrier_mutex){+.+.}:
<4> [430.222742] __lock_acquire+0x1328/0x15d0
<4> [430.222743] lock_acquire+0xa7/0x1c0
<4> [430.222746] __mutex_lock+0x9a/0x9d0
<4> [430.222747] rcu_barrier+0x23/0x190
<4> [430.222850] i915_gem_object_unbind+0x264/0x3d0 [i915]
<4> [430.222882] i915_gem_shrink+0x297/0x5f0 [i915]
<4> [430.222912] i915_gem_shrink_all+0x38/0x60 [i915]
<4> [430.222934] i915_drop_caches_set+0x1f0/0x240 [i915]
<4> [430.222938] simple_attr_write+0xb0/0xd0
<4> [430.222941] full_proxy_write+0x51/0x80
<4> [430.222943] vfs_write+0xb9/0x1d0
<4> [430.222944] ksys_write+0x9f/0xe0
<4> [430.222946] do_syscall_64+0x4f/0x210
<4> [430.222948] entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4> [430.222950]
other info that might help us debug this:
<4> [430.222952] Chain exists of:
rcu_state.barrier_mutex --> cpu_hotplug_lock.rw_sem --> fs_reclaim
<4> [430.222955] Possible unsafe locking scenario:
<4> [430.222957] CPU0 CPU1
<4> [430.222958] ---- ----
<4> [430.222960] lock(fs_reclaim);
<4> [430.222961] lock(cpu_hotplug_lock.rw_sem);
<4> [430.222963] lock(fs_reclaim);
<4> [430.222964] lock(rcu_state.barrier_mutex);
<4> [430.222966]
*** DEADLOCK ***
<4> [430.222968] 3 locks held by gem_pwrite/2317:
<4> [430.222969] #0: ffff88849e2d9408 (sb_writers#14){.+.+}, at: vfs_write+0x1a4/0x1d0
<4> [430.222973] #1: ffff888496976db0 (&attr->mutex){+.+.}, at: simple_attr_write+0x36/0xd0
<4> [430.222976] #2: ffffffff82263a40 (fs_reclaim){+.+.}, at: fs_reclaim_acquire.part.117+0x0/0x30
<4> [430.222980]
stack backtrace:
<4> [430.222982] CPU: 1 PID: 2317 Comm: gem_pwrite Tainted: G U 5.4.0-rc8-CI-CI_DRM_7508+ #1
<4> [430.222985] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake U DDR4 SODIMM RVP, BIOS TGLSFWI1.R00.2321.A08.1909162051 09/16/2019
<4> [430.222989] Call Trace:
<4> [430.222992] dump_stack+0x71/0x9b
<4> [430.222995] check_noncircular+0x19b/0x1c0
<4> [430.222998] ? __lock_acquire+0x1328/0x15d0
<4> [430.222999] __lock_acquire+0x1328/0x15d0
<4> [430.223001] ? mark_held_locks+0x49/0x70
<4> [430.223003] lock_acquire+0xa7/0x1c0
<4> [430.223005] ? rcu_barrier+0x23/0x190
<4> [430.223008] __mutex_lock+0x9a/0x9d0
<4> [430.223009] ? rcu_barrier+0x23/0x190
<4> [430.223011] ? rcu_barrier+0x23/0x190
<4> [430.223013] ? find_held_lock+0x2d/0x90
<4> [430.223045] ? i915_gem_object_unbind+0x24a/0x3d0 [i915]
<4> [430.223048] ? rcu_barrier+0x23/0x190
<4> [430.223049] rcu_barrier+0x23/0x190
<4> [430.223081] i915_gem_object_unbind+0x264/0x3d0 [i915]
<4> [430.223119] i915_gem_shrink+0x297/0x5f0 [i915]
Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
---
drivers/gpu/drm/i915/gem/i915_gem_domain.c | 4 +++-
drivers/gpu/drm/i915/i915_drv.h | 1 +
drivers/gpu/drm/i915/i915_gem.c | 2 +-
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
index 53e28e417cc9..88ab7e71f36c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
@@ -203,7 +203,9 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
i915_gem_object_unlock(obj);
/* The cache-level will be applied when each vma is rebound. */
- return i915_gem_object_unbind(obj, I915_GEM_OBJECT_UNBIND_ACTIVE);
+ return i915_gem_object_unbind(obj,
+ I915_GEM_OBJECT_UNBIND_ACTIVE |
+ I915_GEM_OBJECT_UNBIND_BARRIER);
}
int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c3d8af28bfc1..bae97cd62cb9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1845,6 +1845,7 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
unsigned long flags);
#define I915_GEM_OBJECT_UNBIND_ACTIVE BIT(0)
+#define I915_GEM_OBJECT_UNBIND_BARRIER BIT(1)
void i915_gem_runtime_suspend(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 919d3a723c50..5eeef1ef7448 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -178,7 +178,7 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj,
list_splice_init(&still_in_list, &obj->vma.list);
spin_unlock(&obj->vma.lock);
- if (ret == -EAGAIN && flags & I915_GEM_OBJECT_UNBIND_ACTIVE) {
+ if (ret == -EAGAIN && flags & I915_GEM_OBJECT_UNBIND_BARRIER) {
rcu_barrier(); /* flush the i915_vm_release() */
goto try_again;
}
--
2.24.0
More information about the Intel-gfx
mailing list