[Intel-gfx] [PATCH] drm/i915: Mark up obj->mm.lock for shrinker

Mon Oct 31 13:32:09 UTC 2016

On Mon, Oct 31, 2016 at 01:01:52PM +0000, Tvrtko Ursulin wrote:
> 
> On 31/10/2016 12:40, Chris Wilson wrote:
> >As we may allocate from within the obj->mm.lock we may enter the
> >shrinker for direct reclaim. Operating on the current object is
> >prevented by checking for obj->mm.pages (which is only set as the last
> >operation in the allocation path). However, we need to identity the
> >single recursion of accessing another object's obj->mm.lock as the two
> >locks have identical class and so appear to be the same to lockdep,
> >convincing it that a deadlock is possible. Use mutex_lock_nested() to
> >remove the false positive.
> >
> >[ 2165.945734] =================================
> >[ 2165.945749] [ INFO: inconsistent lock state ]
> >[ 2165.945765] 4.9.0-rc2+ #2 Tainted: G        W
> >[ 2165.945781] ---------------------------------
> >[ 2165.945796] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
> >[ 2165.945816] kswapd0/62 [HC0[0]:SC0[0]:HE1:SE1] takes: (&obj->mm.lock){+.+.?.}, at: [<ffffffffc0289a1f>] i915_gem_shrink+0x29f/0x500 [i915]
> >[ 2165.945904] {RECLAIM_FS-ON-W} state was registered at:
> >[ 2165.945931] [<ffffffffb10bd50f>] mark_held_locks+0x6f/0xa0
> >[ 2165.945956] [<ffffffffb10bf889>] lockdep_trace_alloc+0x69/0xc0
> >[ 2165.945982] [<ffffffffb11eea53>] kmem_cache_alloc_trace+0x33/0x2a0
> >[ 2165.946019] [<ffffffffc028a28a>] i915_gem_object_get_pages_stolen+0x6a/0xd0 [i915]
> >[ 2165.946060] [<ffffffffc027e1d0>] ____i915_gem_object_get_pages+0x20/0x60 [i915]
> >[ 2165.946098] [<ffffffffc027e268>] __i915_gem_object_get_pages+0x58/0x70 [i915]
> >[ 2165.946138] [<ffffffffc028a3dc>] _i915_gem_object_create_stolen+0xec/0x120 [i915]
> >[ 2165.946177] [<ffffffffc028af73>] i915_gem_object_create_stolen_for_preallocated+0xf3/0x3f0 [i915]
> >[ 2165.946222] [<ffffffffc02bae43>] intel_alloc_initial_plane_obj.isra.125+0xd3/0x200 [i915]
> >[ 2165.946266] [<ffffffffc02cb1c1>] intel_modeset_init+0x931/0x1530
> >[i915]
> >[ 2165.946301] [<ffffffffc023d584>] i915_driver_load+0xa14/0x14a0 [i915]
> >[ 2165.946335] [<ffffffffc0248aff>] i915_pci_probe+0x4f/0x70 [i915]
> >[ 2165.946362] [<ffffffffb13cc452>] local_pci_probe+0x42/0xa0
> >[ 2165.946386] [<ffffffffb13cd903>] pci_device_probe+0x103/0x150
> >[ 2165.946411] [<ffffffffb14adeb3>] driver_probe_device+0x223/0x430
> >[ 2165.946436] [<ffffffffb14ae1a3>] __driver_attach+0xe3/0xf0
> >[ 2165.946461] [<ffffffffb14ab943>] bus_for_each_dev+0x73/0xc0
> >[ 2165.946485] [<ffffffffb14ad5ee>] driver_attach+0x1e/0x20
> >[ 2165.946508] [<ffffffffb14ad003>] bus_add_driver+0x173/0x270
> >[ 2165.946533] [<ffffffffb14aee70>] driver_register+0x60/0xe0
> >[ 2165.946557] [<ffffffffb13cbd6d>] __pci_register_driver+0x5d/0x60
> >[ 2165.946606] [<ffffffffc0378057>] soundcore_open+0x17/0x230 [soundcore]
> >[ 2165.946636] [<ffffffffb1000450>] do_one_initcall+0x50/0x180
> >[ 2165.946661] [<ffffffffb117fd2d>] do_init_module+0x5f/0x1f1
> >[ 2165.946685] [<ffffffffb1108964>] load_module+0x2174/0x2a80
> >[ 2165.946709] [<ffffffffb11094df>] SYSC_finit_module+0xdf/0x110
> >[ 2165.946734] [<ffffffffb110952e>] SyS_finit_module+0xe/0x10
> >[ 2165.946758] [<ffffffffb1742aea>] entry_SYSCALL_64_fastpath+0x18/0xad
> >[ 2165.946776] irq event stamp: 90871
> >[ 2165.946788] hardirqs last  enabled at (90871):
> >[ 2165.946805] [<ffffffffb173e9da>] __mutex_unlock_slowpath+0x11a/0x1c0
> >[ 2165.946823] hardirqs last disabled at (90870):
> >[ 2165.946839] [<ffffffffb173e91b>] __mutex_unlock_slowpath+0x5b/0x1c0
> >[ 2165.946856] softirqs last  enabled at (90858):
> >[ 2165.946872] [<ffffffffb174581a>] __do_softirq+0x39a/0x4c6
> >[ 2165.946887] softirqs last disabled at (90671):
> >[ 2165.946902] [<ffffffffb1066cea>] irq_exit+0xea/0xf0
> >[ 2165.946916] other info that might help us debug this:
> >[ 2165.946936]  Possible unsafe locking scenario:
> >[ 2165.946955]        CPU0
> >[ 2165.946965]        ----
> >[ 2165.946975]   lock(&obj->mm.lock);
> >[ 2165.947000]   <Interrupt>
> >[ 2165.947010]     lock(&obj->mm.lock);
> >[ 2165.947035] *** DEADLOCK ***
> >[ 2165.947054] 2 locks held by kswapd0/62:
> >[ 2165.947067]  #0: (shrinker_rwsem){++++..}, at: [<ffffffffb119a20e>] shrink_slab.part.40+0x5e/0x5d0
> >[ 2165.947120]  #1: (&dev->struct_mutex){+.+.+.}, at: [<ffffffffc028954b>] i915_gem_shrinker_lock+0x1b/0x60 [i915]
> >[ 2165.948909] stack backtrace:
> >[ 2165.950650] CPU: 2 PID: 62 Comm: kswapd0 Tainted: G        W 4.9.0-rc2+ #2
> >[ 2165.951587] Hardware name: LENOVO 80MX/Lenovo E31-80, BIOS DCCN34WW(V2.03) 12/01/2015
> >[ 2165.952484]  ffffc90000b5f8c8 ffffffffb137f645 ffff88016c5a2700 ffffffffb25f20a0
> >[ 2165.953395]  ffffc90000b5f918 ffffffffb10bcecd 0000000000000000 ffff880100000001
> >[ 2165.954305]  0000000000000001 000000000000000a ffff88016c5a2fd0 ffff88016c5a2700
> >[ 2165.955240] Call Trace:
> >[ 2165.956170]  [<ffffffffb137f645>] dump_stack+0x68/0x93
> >[ 2165.957071]  [<ffffffffb10bcecd>] print_usage_bug+0x1dd/0x1f0
> >[ 2165.957979]  [<ffffffffb10bd439>] mark_lock+0x559/0x5c0
> >[ 2165.958875]  [<ffffffffb10bc3f0>] ?  print_shortest_lock_dependencies+0x1b0/0x1b0
> >[ 2165.959829]  [<ffffffffb10be04d>] __lock_acquire+0x66d/0x12a0
> >[ 2165.960729]  [<ffffffffb11ef541>] ? __slab_free+0xa1/0x340
> >[ 2165.961625]  [<ffffffffb10dba5d>] ?  debug_lockdep_rcu_enabled+0x1d/0x20
> >[ 2165.962530]  [<ffffffffb10bd50f>] ? mark_held_locks+0x6f/0xa0
> >[ 2165.963457]  [<ffffffffb10bf0b0>] lock_acquire+0xf0/0x1f0
> >[ 2165.964368]  [<ffffffffc0289a1f>] ? i915_gem_shrink+0x29f/0x500 [i915]
> >[ 2165.965269]  [<ffffffffc0289a1f>] ? i915_gem_shrink+0x29f/0x500 [i915]
> >[ 2165.966150]  [<ffffffffb173d837>] mutex_lock_nested+0x77/0x420
> >[ 2165.967030]  [<ffffffffc0289a1f>] ? i915_gem_shrink+0x29f/0x500 [i915]
> >[ 2165.967952]  [<ffffffffc027c7a1>] ?  __i915_gem_object_put_pages.part.58+0x161/0x1b0 [i915]
> >[ 2165.968835]  [<ffffffffc0289a1f>] i915_gem_shrink+0x29f/0x500 [i915]
> >[ 2165.969712]  [<ffffffffc0289e40>] i915_gem_shrinker_scan+0x70/0xb0 [i915]
> >[ 2165.970591]  [<ffffffffb119a3ae>] shrink_slab.part.40+0x1fe/0x5d0
> >[ 2165.971504]  [<ffffffffb119f19c>] shrink_node+0x22c/0x320
> >[ 2165.972371]  [<ffffffffb11a05fb>] kswapd+0x38b/0x9b0
> >[ 2165.973238]  [<ffffffffb11a0270>] ?  mem_cgroup_shrink_node+0x330/0x330
> >[ 2165.974068]  [<ffffffffb108630f>] kthread+0xff/0x120
> >[ 2165.974929]  [<ffffffffb1086210>] ? kthread_park+0x60/0x60
> >[ 2165.975847]  [<ffffffffb1742d57>] ret_from_fork+0x27/0x40
> >
> >Reported-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >Fixes: 1233e2db199d ("drm/i915: Move object backing storage manipulation...")
> >Testcase: igt/gem_ctx_create/maximum-swap
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> >Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> >---
> > drivers/gpu/drm/i915/i915_gem_shrinker.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> >index f4a1515737bd..0993afc0e725 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> >@@ -223,7 +223,9 @@ i915_gem_shrink(struct drm_i915_private *dev_priv,
> > 				continue;
> >
> > 			if (unsafe_drop_pages(obj)) {
> >-				mutex_lock(&obj->mm.lock);
> >+				/* May arrive from get_pages on another bo */
> >+				mutex_lock_nested(&obj->mm.lock,
> >+						  SINGLE_DEPTH_NESTING);
> > 				if (!obj->mm.pages) {
> > 					__i915_gem_object_invalidate(obj);
> > 					list_del_init(&obj->global_list);
> >
> 
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Monday troubles. Fixed up the broken newline in the stacktrace, forgot
to copy across the r-b. ARGH.

Thanks for the report, poke and review.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre