[Bug 100099] New: [BAT] [KBL] nvme driver is using a mutex inside an rcu callback causing igt at gem_exec_suspend@basic-s4-devices to fail with dmesg-warn
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Tue Mar 7 12:02:21 UTC 2017
https://bugs.freedesktop.org/show_bug.cgi?id=100099
Bug ID: 100099
Summary: [BAT] [KBL] nvme driver is using a mutex inside an rcu
callback causing igt at gem_exec_suspend@basic-s4-devices
to fail with dmesg-warn
Product: DRI
Version: DRI git
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: critical
Priority: medium
Component: DRM/Intel
Assignee: intel-gfx-bugs at lists.freedesktop.org
Reporter: jari.tahvanainen at intel.com
QA Contact: intel-gfx-bugs at lists.freedesktop.org
CC: intel-gfx-bugs at lists.freedesktop.org
Starting from CI_DRM_2252 igt at gem_exec_suspend@basic-s4-devices have had
dmesg-warn as end state.
See
https://intel-gfx-ci.01.org/CI/CI_DRM_2296/fi-kbl-7500u/igt@gem_exec_suspend@basic-s4-devices.html
for the latest output and links to dmesg files. Some data copied below:
Command
/opt/igt/tests/gem_exec_suspend --run-subtest basic-S4-devices
Dmesg
[ 245.192104] Suspending console(s) (use no_console_suspend to debug)
[ 245.294768] ======================================================
[ 245.294769] [ INFO: possible circular locking dependency detected ]
[ 245.294771] 4.11.0-rc1-CI-CI_DRM_2296+ #1 Not tainted
[ 245.294772] -------------------------------------------------------
[ 245.294773] kworker/u8:1/72 is trying to acquire lock:
[ 245.294774] (rcu_preempt_state.barrier_mutex){+.+...}, at:
[<ffffffff81100731>] _rcu_barrier+0x31/0x160
[ 245.294782]
but task is already holding lock:
[ 245.294783] (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa00bbc48>]
i915_gem_freeze+0x18/0x30 [i915]
[ 245.294827]
which lock already depends on the new lock.
[ 245.294828]
the existing dependency chain (in reverse order) is:
[ 245.294829]
-> #5 (&dev->struct_mutex){+.+.+.}:
[ 245.294836] lock_acquire+0xc9/0x220
[ 245.294842] drm_gem_object_put_unlocked+0x40/0xc0
[ 245.294845] drm_gem_mmap+0xd1/0x100
[ 245.294849] mmap_region+0x39e/0x600
[ 245.294851] do_mmap+0x44c/0x500
[ 245.294853] vm_mmap_pgoff+0x9d/0xd0
[ 245.294856] SyS_mmap_pgoff+0x182/0x220
[ 245.294858] SyS_mmap+0x16/0x20
[ 245.294862] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 245.294863]
-> #4 (&mm->mmap_sem){++++++}:
[ 245.294868] lock_acquire+0xc9/0x220
[ 245.294871] __might_fault+0x6b/0x90
[ 245.294873] filldir+0xb2/0x140
[ 245.294876] dcache_readdir+0xf4/0x170
[ 245.294877] iterate_dir+0x7a/0x170
[ 245.294879] SyS_getdents+0xa2/0x150
[ 245.294881] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 245.294882]
-> #3 (&sb->s_type->i_mutex_key#3){+++++.}:
[ 245.294888] lock_acquire+0xc9/0x220
[ 245.294890] down_write+0x3f/0x70
[ 245.294894] start_creating+0x5a/0x100
[ 245.294897] debugfs_create_dir+0xc/0xd0
[ 245.294900] blk_mq_debugfs_register+0x1c/0x70
[ 245.294904] blk_mq_register_dev+0xbd/0x120
[ 245.294908] blk_register_queue+0x4c/0x150
[ 245.294910] device_add_disk+0x23b/0x4f0
[ 245.294914] nvme_validate_ns+0x1fa/0x260
[ 245.294917] nvme_scan_work+0x188/0x300
[ 245.294921] process_one_work+0x1f4/0x6d0
[ 245.294924] worker_thread+0x49/0x4a0
[ 245.294927] kthread+0x107/0x140
[ 245.294929] ret_from_fork+0x2e/0x40
[ 245.294929]
-> #2 (all_q_mutex){+.+.+.}:
[ 245.294935] lock_acquire+0xc9/0x220
[ 245.294938] __mutex_lock+0x6e/0x990
[ 245.294941] mutex_lock_nested+0x16/0x20
[ 245.294944] blk_mq_queue_reinit_work+0x13/0x110
[ 245.294947] blk_mq_queue_reinit_dead+0x17/0x20
[ 245.294950] cpuhp_invoke_callback+0x9c/0x8a0
[ 245.294953] cpuhp_down_callbacks+0x3d/0x80
[ 245.294956] _cpu_down+0xad/0xe0
[ 245.294959] freeze_secondary_cpus+0xb1/0x3d0
[ 245.294962] suspend_devices_and_enter+0x4a0/0xca0
[ 245.294964] pm_suspend+0x59e/0xa50
[ 245.294966] state_store+0x7d/0xf0
[ 245.294969] kobj_attr_store+0xf/0x20
[ 245.294972] sysfs_kf_write+0x40/0x50
[ 245.294974] kernfs_fop_write+0x130/0x1b0
[ 245.294977] __vfs_write+0x23/0x120
[ 245.294979] vfs_write+0xc6/0x1f0
[ 245.294982] SyS_write+0x44/0xb0
[ 245.294984] entry_SYSCALL_64_fastpath+0x1c/0xb1
[ 245.294985]
-> #1 (cpu_hotplug.lock){+.+.+.}:
[ 245.294990] lock_acquire+0xc9/0x220
[ 245.294993] __mutex_lock+0x6e/0x990
[ 245.294996] mutex_lock_nested+0x16/0x20
[ 245.294999] get_online_cpus+0x61/0x80
[ 245.295002] _rcu_barrier+0x9f/0x160
[ 245.295004] rcu_barrier+0x10/0x20
[ 245.295008] netdev_run_todo+0x5f/0x310
[ 245.295011] rtnl_unlock+0x9/0x10
[ 245.295013] default_device_exit_batch+0x133/0x150
[ 245.295017] ops_exit_list.isra.0+0x4d/0x60
[ 245.295020] cleanup_net+0x1d8/0x2c0
[ 245.295023] process_one_work+0x1f4/0x6d0
[ 245.295026] worker_thread+0x49/0x4a0
[ 245.295028] kthread+0x107/0x140
[ 245.295030] ret_from_fork+0x2e/0x40
[ 245.295030]
-> #0 (rcu_preempt_state.barrier_mutex){+.+...}:
[ 245.295036] __lock_acquire+0x1b44/0x1b50
[ 245.295039] lock_acquire+0xc9/0x220
[ 245.295042] __mutex_lock+0x6e/0x990
[ 245.295045] mutex_lock_nested+0x16/0x20
[ 245.295047] _rcu_barrier+0x31/0x160
[ 245.295049] rcu_barrier+0x10/0x20
[ 245.295089] i915_gem_shrink_all+0x33/0x40 [i915]
[ 245.295126] i915_gem_freeze+0x20/0x30 [i915]
[ 245.295152] i915_pm_freeze+0x1d/0x20 [i915]
[ 245.295155] pci_pm_freeze+0x59/0xe0
[ 245.295157] dpm_run_callback+0x6f/0x330
[ 245.295159] __device_suspend+0xea/0x330
[ 245.295161] async_suspend+0x1a/0x90
[ 245.295165] async_run_entry_fn+0x34/0x160
[ 245.295168] process_one_work+0x1f4/0x6d0
[ 245.295171] worker_thread+0x49/0x4a0
[ 245.295173] kthread+0x107/0x140
[ 245.295175] ret_from_fork+0x2e/0x40
[ 245.295176]
other info that might help us debug this:
[ 245.295177] Chain exists of:
rcu_preempt_state.barrier_mutex --> &mm->mmap_sem -->
&dev->struct_mutex
[ 245.295181] Possible unsafe locking scenario:
[ 245.295182] CPU0 CPU1
[ 245.295183] ---- ----
[ 245.295183] lock(&dev->struct_mutex);
[ 245.295185] lock(&mm->mmap_sem);
[ 245.295187] lock(&dev->struct_mutex);
[ 245.295189] lock(rcu_preempt_state.barrier_mutex);
[ 245.295191]
*** DEADLOCK ***
[ 245.295192] 4 locks held by kworker/u8:1/72:
[ 245.295193] #0: ("events_unbound"){.+.+.+}, at: [<ffffffff8109dcae>]
process_one_work+0x16e/0x6d0
[ 245.295199] #1: ((&entry->work)){+.+.+.}, at: [<ffffffff8109dcae>]
process_one_work+0x16e/0x6d0
[ 245.295204] #2: (&dev->mutex){......}, at: [<ffffffff815d5e34>]
__device_suspend+0xb4/0x330
[ 245.295208] #3: (&dev->struct_mutex){+.+.+.}, at: [<ffffffffa00bbc48>]
i915_gem_freeze+0x18/0x30 [i915]
[ 245.295247]
stack backtrace:
[ 245.295250] CPU: 2 PID: 72 Comm: kworker/u8:1 Not tainted
4.11.0-rc1-CI-CI_DRM_2296+ #1
[ 245.295251] Hardware name: GIGABYTE GB-BKi7A-7500/MFLP7AP-00, BIOS F1
07/27/2016
[ 245.295256] Workqueue: events_unbound async_run_entry_fn
[ 245.295258] Call Trace:
[ 245.295262] dump_stack+0x67/0x92
[ 245.295266] print_circular_bug+0x1e0/0x2e0
[ 245.295269] __lock_acquire+0x1b44/0x1b50
[ 245.295273] ? trace_hardirqs_on+0xd/0x10
[ 245.295277] lock_acquire+0xc9/0x220
[ 245.295280] ? _rcu_barrier+0x31/0x160
[ 245.295282] ? _rcu_barrier+0x31/0x160
[ 245.295286] __mutex_lock+0x6e/0x990
[ 245.295288] ? _rcu_barrier+0x31/0x160
[ 245.295290] ? _rcu_barrier+0x31/0x160
[ 245.295293] ? synchronize_rcu_expedited+0x35/0xb0
[ 245.295296] ? _raw_spin_unlock_irqrestore+0x52/0x60
[ 245.295300] mutex_lock_nested+0x16/0x20
[ 245.295302] _rcu_barrier+0x31/0x160
[ 245.295305] rcu_barrier+0x10/0x20
[ 245.295343] i915_gem_shrink_all+0x33/0x40 [i915]
[ 245.295378] i915_gem_freeze+0x20/0x30 [i915]
[ 245.295404] i915_pm_freeze+0x1d/0x20 [i915]
[ 245.295406] pci_pm_freeze+0x59/0xe0
[ 245.295409] dpm_run_callback+0x6f/0x330
[ 245.295411] ? pci_pm_poweroff+0xf0/0xf0
[ 245.295413] __device_suspend+0xea/0x330
[ 245.295415] async_suspend+0x1a/0x90
[ 245.295419] async_run_entry_fn+0x34/0x160
[ 245.295422] process_one_work+0x1f4/0x6d0
[ 245.295425] ? process_one_work+0x16e/0x6d0
[ 245.295429] worker_thread+0x49/0x4a0
[ 245.295432] kthread+0x107/0x140
[ 245.295435] ? process_one_work+0x6d0/0x6d0
[ 245.295437] ? kthread_create_on_node+0x40/0x40
[ 245.295440] ret_from_fork+0x2e/0x40
[ 250.434263] usb usb1: root hub lost power or was reset
[ 250.434265] usb usb2: root hub lost power or was reset
[ 250.434719] usb usb3: root hub lost power or was reset
[ 250.434720] usb usb4: root hub lost power or was reset
--
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20170307/7fec2b60/attachment.html>
More information about the intel-gfx-bugs
mailing list