[Bug 101237] [SKL] GPU HANG: ecode 9:0:0x85dffffb, reason: Hang on rcs, action: reset

Fri Oct 13 19:20:52 UTC 2017

https://bugs.freedesktop.org/show_bug.cgi?id=101237

--- Comment #16 from Elizabeth <elizabethx.de.la.torre.mena at intel.com> ---
Hello Jonathan, if the problem is reproducible, you can help the Mesa team by
providing an apitrace file having the latest Mesa release and if possible the
latest kernel release. Also information as to if it is desktop dependent
(KDE/XFCE/Unity/Gnome), steps to reproduce, sna/modesetting xorg configuration
may help.

>From dmesg:
[    0.000000] NUMA: Warning: node 0 [mem 0x00000000-0x77ffffff] overlaps with
itself [mem 0x00100000-0x46614fff]

... 

[   19.254936] ======================================================
[   19.254937] WARNING: possible circular locking dependency detected
[   19.254938] 4.13.0-0.rc0.git3.1.fc27.x86_64 #1 Tainted: G     U         
[   19.254939] ------------------------------------------------------
[   19.254940] tuned/1671 is trying to acquire lock:
[   19.254941]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff9c7c42ea>]
store+0x2a/0x80
[   19.254947] 
but task is already holding lock:
[   19.254948]  (s_active#86){++++.+}, at: [<ffffffff9c36dccc>]
kernfs_fop_write+0x12c/0x1e0
[   19.254953] 
which lock already depends on the new lock.

[   19.254954] 
the existing dependency chain (in reverse order) is:
[   19.254955] 
-> #2 (s_active#86){++++.+}:
[   19.254961]        lock_acquire+0xa3/0x1f0
[   19.254962]        __kernfs_remove+0x26b/0x310
[   19.254963]        kernfs_remove+0x23/0x40
[   19.254965]        sysfs_remove_dir+0x51/0x60
[   19.254968]        kobject_del.part.3+0x13/0x40
[   19.254970]        kobject_put+0x6e/0x1a0
[   19.254972]        cpufreq_policy_free+0xe3/0x150
[   19.254974]        cpufreq_online+0xef/0x7a0
[   19.254975]        cpufreq_add_dev+0x51/0x80
[   19.254977]        subsys_interface_register+0xe1/0x160
[   19.254978]        cpufreq_register_driver+0x15d/0x230
[   19.254982]        iw_cm_accept+0x12f/0x150 [iw_cm]
[   19.254984]        do_one_initcall+0x50/0x1a0
[   19.254986]        do_init_module+0x5f/0x1e8
[   19.254989]        load_module+0x24b0/0x2b50
[   19.254991]        SYSC_init_module+0x194/0x1d0
[   19.254993]        SyS_init_module+0xe/0x10
[   19.254995]        entry_SYSCALL_64_fastpath+0x1f/0xbe
[   19.254996] 
-> #1 (subsys mutex#5){+.+.+.}:
[   19.255000]        lock_acquire+0xa3/0x1f0
[   19.255003]        __mutex_lock+0x86/0x9f0
[   19.255005]        mutex_lock_nested+0x1b/0x20
[   19.255007]        subsys_interface_register+0x7d/0x160
[   19.255009]        cpufreq_register_driver+0x15d/0x230
[   19.255011]        iw_cm_accept+0x12f/0x150 [iw_cm]
[   19.255013]        do_one_initcall+0x50/0x1a0
[   19.255014]        do_init_module+0x5f/0x1e8
[   19.255016]        load_module+0x24b0/0x2b50
[   19.255018]        SYSC_init_module+0x194/0x1d0
[   19.255020]        SyS_init_module+0xe/0x10
[   19.255248]        entry_SYSCALL_64_fastpath+0x1f/0xbe
[   19.255249] 
-> #0 (cpu_hotplug_lock.rw_sem){++++++}:
[   19.255252]        __lock_acquire+0x1367/0x13b0
[   19.255254]        lock_acquire+0xa3/0x1f0
[   19.255255]        cpus_read_lock+0x42/0x90
[   19.255257]        store+0x2a/0x80
[   19.255258]        sysfs_kf_write+0x42/0x60
[   19.255259]        kernfs_fop_write+0x151/0x1e0
[   19.255261]        __vfs_write+0x37/0x170
[   19.255263]        vfs_write+0xc6/0x1c0
[   19.255264]        SyS_write+0x58/0xc0
[   19.255265]        do_syscall_64+0x6c/0x1c0
[   19.255267]        return_from_SYSCALL_64+0x0/0x7a
[   19.255267] 
other info that might help us debug this:

[   19.255269] Chain exists of:
  cpu_hotplug_lock.rw_sem --> subsys mutex#5 --> s_active#86

[   19.255273]  Possible unsafe locking scenario:

[   19.255274]        CPU0                    CPU1
[   19.255274]        ----                    ----
[   19.255275]   lock(s_active#86);
[   19.255277]                                lock(subsys mutex#5);
[   19.255492]                                lock(s_active#86);
[   19.255494]   lock(cpu_hotplug_lock.rw_sem);
[   19.255496] 
 *** DEADLOCK ***

[   19.255497] 4 locks held by tuned/1671:
[   19.255498]  #0:  (&f->f_pos_lock){+.+.+.}, at: [<ffffffff9c2f3d4c>]
__fdget_pos+0x4c/0x60
[   19.255501]  #1:  (sb_writers#3){.+.+.+}, at: [<ffffffff9c2cc343>]
vfs_write+0x193/0x1c0
[   19.255505]  #2:  (&of->mutex){+.+.+.}, at: [<ffffffff9c36dcc3>]
kernfs_fop_write+0x123/0x1e0
[   19.255508]  #3:  (s_active#86){++++.+}, at: [<ffffffff9c36dccc>]
kernfs_fop_write+0x12c/0x1e0
[   19.255511] 
stack backtrace:
[   19.255513] CPU: 1 PID: 1671 Comm: tuned Tainted: G     U         
4.13.0-0.rc0.git3.1.fc27.x86_64 #1
[   19.255514] Hardware name: HP ProLiant m710x Server Cartridge/ProLiant m710x
Server Cartridge, BIOS H07 05/23/2016
[   19.255515] Call Trace:
[   19.255517]  dump_stack+0x8e/0xcd
[   19.255519]  print_circular_bug+0x1b6/0x210
[   19.255521]  __lock_acquire+0x1367/0x13b0
[   19.255524]  ? debug_lockdep_rcu_enabled+0x1d/0x30
[   19.255526]  lock_acquire+0xa3/0x1f0
[   19.255528]  ? lock_acquire+0xa3/0x1f0
[   19.255530]  ? store+0x2a/0x80
[   19.255533]  cpus_read_lock+0x42/0x90
[   19.255535]  ? store+0x2a/0x80
[   19.255537]  store+0x2a/0x80
[   19.255539]  sysfs_kf_write+0x42/0x60
[   19.255541]  kernfs_fop_write+0x151/0x1e0
[   19.255544]  __vfs_write+0x37/0x170
[   19.255545]  ? rcu_read_lock_sched_held+0x79/0x80
[   19.255547]  ? rcu_sync_lockdep_assert+0x2c/0x60
[   19.255548]  ? __sb_start_write+0x135/0x190
[   19.255550]  ? vfs_write+0x193/0x1c0
[   19.255551]  vfs_write+0xc6/0x1c0
[   19.255553]  SyS_write+0x58/0xc0
[   19.255555]  do_syscall_64+0x6c/0x1c0
[   19.255556]  entry_SYSCALL64_slow_path+0x25/0x25
[   19.255558] RIP: 0033:0x7f809f81bcad
[   19.255559] RSP: 002b:00007f808bffd510 EFLAGS: 00000293 ORIG_RAX:
0000000000000001
[   19.255560] RAX: ffffffffffffffda RBX: 000000000000000b RCX:
00007f809f81bcad
[   19.255561] RDX: 000000000000000b RSI: 00007f80a09f1000 RDI:
000000000000000c
[   19.255562] RBP: 00007f80a09f1000 R08: 00007f808002d5c0 R09:
00007f808bfff700
[   19.255563] R10: 000000000000006a R11: 0000000000000293 R12:
00007f808002d4e0
[   19.255564] R13: 000000000000000b R14: 0000000001932490 R15:
00000000018dad40

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20171013/b5a2565d/attachment.html>