[Bug 104840] New: [SKL] DEADLOCK: Kernel deadlocks when running gem_reset_stats at reset-stats-ctx-default.

Mon Jan 29 17:49:27 UTC 2018

https://bugs.freedesktop.org/show_bug.cgi?id=104840

            Bug ID: 104840
           Summary: [SKL] DEADLOCK: Kernel deadlocks when running
                    gem_reset_stats at reset-stats-ctx-default.
           Product: DRI
           Version: XOrg git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: medium
         Component: DRM/Intel
          Assignee: intel-gfx-bugs at lists.freedesktop.org
          Reporter: antonio.argenziano at intel.com
        QA Contact: intel-gfx-bugs at lists.freedesktop.org
                CC: intel-gfx-bugs at lists.freedesktop.org

Description:
------
running gem_reset_stats at reset-stats-ctx-default on SKL causes a deadlock. What
I think is happening is that the test uses both gem_context_destroy() and
drop_caches_set() which will contend the struct mutex and if context destroy
gets stuck, it will occupy i915->wq -> nothing can progress because retire
cannot be scheduled -> drop_caches_set() keeps waiting for idle.

Steps:
------
1. Execute gem_reset_stats at reset-stats-ctx-default

Actual results:
------
Driver gets deadlocked, test never completes.

Expected results:
------
Test passes.

Dmesg output:
------
[ 7484.031148] [IGT] gem_reset_stats: starting subtest reset-stats-ctx-default

[ 7613.403760] INFO: task kworker/u8:3:1714 blocked for more than 120 seconds.
[ 7613.403815]       Tainted: G     U           4.15.0-rc9+ #44
[ 7613.403844] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 7613.403884] kworker/u8:3    D    0  1714      2 0x80000000
[ 7613.403999] Workqueue: i915 __i915_gem_free_work [i915]
[ 7613.404007] Call Trace:
[ 7613.404026]  ? __schedule+0x345/0xc50
[ 7613.404044]  schedule+0x39/0x90
[ 7613.404051]  schedule_preempt_disabled+0x11/0x20
[ 7613.404057]  __mutex_lock+0x3b7/0x8d0
[ 7613.404063]  ? __mutex_lock+0x122/0x8d0
[ 7613.404072]  ? trace_buffer_unlock_commit_regs+0x37/0x90
[ 7613.404151]  ? __i915_gem_free_objects+0x89/0x540 [i915]
[ 7613.404243]  __i915_gem_free_objects+0x89/0x540 [i915]
[ 7613.404319]  __i915_gem_free_work+0x51/0x90 [i915]
[ 7613.404335]  process_one_work+0x1b4/0x5d0
[ 7613.404342]  ? process_one_work+0x130/0x5d0
[ 7613.404361]  worker_thread+0x4a/0x3e0
[ 7613.404378]  kthread+0x100/0x140
[ 7613.404385]  ? process_one_work+0x5d0/0x5d0
[ 7613.404390]  ? kthread_delayed_work_timer_fn+0x80/0x80
[ 7613.404402]  ? do_group_exit+0x46/0xc0
[ 7613.404409]  ret_from_fork+0x3a/0x50
[ 7613.404437] 
               Showing all locks held in the system:
[ 7613.404447] 1 lock held by khungtaskd/39:
[ 7613.404458]  #0:  (tasklist_lock){.+.+}, at: [<0000000088c6a651>]
debug_show_all_locks+0x39/0x1b0
[ 7613.404489] 1 lock held by in:imklog/809:
[ 7613.404492]  #0:  (&f->f_pos_lock){+.+.}, at: [<00000000cf80f1c9>]
__fdget_pos+0x3f/0x50
[ 7613.404519] 1 lock held by dmesg/1652:
[ 7613.404523]  #0:  (&user->lock){+.+.}, at: [<00000000dd4aba83>]
devkmsg_read+0x3a/0x2f0
[ 7613.404543] 3 locks held by gem_reset_stats/1713:
[ 7613.404547]  #0:  (sb_writers#10){.+.+}, at: [<00000000aadbc565>]
vfs_write+0x18a/0x1c0
[ 7613.404571]  #1:  (&attr->mutex){+.+.}, at: [<000000000e818033>]
simple_attr_write+0x35/0xc0
[ 7613.404590]  #2:  (&dev->struct_mutex){+.+.}, at: [<0000000000b72f77>]
i915_drop_caches_set+0x4e/0x1a0 [i915]
[ 7613.404669] 3 locks held by kworker/u8:3/1714:
[ 7613.404672]  #0:  ((wq_completion)"i915"){+.+.}, at: [<00000000d83ffa4e>]
process_one_work+0x130/0x5d0
[ 7613.404693]  #1:  ((work_completion)(&i915->mm.free_work)){+.+.}, at:
[<00000000d83ffa4e>] process_one_work+0x130/0x5d0
[ 7613.404713]  #2:  (&dev->struct_mutex){+.+.}, at: [<000000007b02c7ef>]
__i915_gem_free_objects+0x89/0x540 [i915]

[ 7613.404795] =============================================

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20180129/c6ccb030/attachment-0001.html>