[Bug 102505] [BAT] igt at chamelium@common-hpd-after-suspend caused ERROR failed to enable link training

Fri Sep 22 07:42:19 UTC 2017

https://bugs.freedesktop.org/show_bug.cgi?id=102505

--- Comment #2 from Marta Löfstedt <marta.lofstedt at intel.com> ---
Note on 4.14.0-rc1 kernels, i.e. CI_DRM_3099 we are now seeing a lock-dep
before the link training issue:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3099/fi-kbl-7500u/igt@chamelium@common-hpd-after-suspend.html
[   70.297898] ======================================================
[   70.297898] WARNING: possible circular locking dependency detected
[   70.297900] 4.14.0-rc1-CI-CI_DRM_3099+ #1 Not tainted
[   70.297901] ------------------------------------------------------
[   70.297901] rtcwake/1459 is trying to acquire lock:
[   70.297902]  ((complete)&st->done){+.+.}, at: [<ffffffff8190987d>]
wait_for_completion+0x1d/0x20
[   70.297907] 
               but task is already holding lock:
[   70.297908]  (sparse_irq_lock){+.+.}, at: [<ffffffff810f2187>]
irq_lock_sparse+0x17/0x20
[   70.297912] 
               which lock already depends on the new lock.

[   70.297912] 
               the existing dependency chain (in reverse order) is:
[   70.297913] 
               -> #1 (sparse_irq_lock){+.+.}:
[   70.297916]        __mutex_lock+0x86/0x9b0
[   70.297918]        mutex_lock_nested+0x1b/0x20
[   70.297920]        irq_lock_sparse+0x17/0x20
[   70.297921]        irq_affinity_online_cpu+0x18/0xd0
[   70.297923]        cpuhp_invoke_callback+0xa3/0x840
[   70.297924] 
               -> #0 ((complete)&st->done){+.+.}:
[   70.297927]        check_prev_add+0x430/0x840
[   70.297929]        __lock_acquire+0x1420/0x15e0
[   70.297930]        lock_acquire+0xb0/0x200
[   70.297932]        wait_for_common+0x58/0x210
[   70.297933]        wait_for_completion+0x1d/0x20
[   70.297934]        takedown_cpu+0x89/0xf0
[   70.297935]        cpuhp_invoke_callback+0xa3/0x840
[   70.297937]        cpuhp_down_callbacks+0x42/0x80
[   70.297937]        _cpu_down+0xb9/0xf0
[   70.297939]        freeze_secondary_cpus+0xa3/0x390
[   70.297940]        suspend_devices_and_enter+0x2fd/0xce0
[   70.297942]        pm_suspend+0x4f0/0x9d0
[   70.297943]        state_store+0x82/0xf0
[   70.297945]        kobj_attr_store+0xf/0x20
[   70.297947]        sysfs_kf_write+0x45/0x60
[   70.297949]        kernfs_fop_write+0x124/0x1c0
[   70.297950]        __vfs_write+0x28/0x130
[   70.297951]        vfs_write+0xcb/0x1c0
[   70.297953]        SyS_write+0x49/0xb0
[   70.297955]        entry_SYSCALL_64_fastpath+0x1c/0xb1
[   70.297955] 
               other info that might help us debug this:

[   70.297956]  Possible unsafe locking scenario:

[   70.297956]        CPU0                    CPU1
[   70.297956]        ----                    ----
[   70.297957]   lock(sparse_irq_lock);
[   70.297958]                                lock((complete)&st->done);
[   70.297959]                                lock(sparse_irq_lock);
[   70.297960]   lock((complete)&st->done);
[   70.297961] 
                *** DEADLOCK ***

[   70.297963] 8 locks held by rtcwake/1459:
[   70.297963]  #0:  (sb_writers#5){.+.+}, at: [<ffffffff81220161>]
vfs_write+0x171/0x1c0
[   70.297966]  #1:  (&of->mutex){+.+.}, at: [<ffffffff812a3302>]
kernfs_fop_write+0xf2/0x1c0
[   70.297970]  #2:  (kn->count#189){.+.+}, at: [<ffffffff812a330b>]
kernfs_fop_write+0xfb/0x1c0
[   70.297973]  #3:  (pm_mutex){+.+.}, at: [<ffffffff810e5f49>]
pm_suspend+0xa9/0x9d0
[   70.297976]  #4:  (acpi_scan_lock){+.+.}, at: [<ffffffff8153b3c7>]
acpi_scan_lock_acquire+0x17/0x20
[   70.297980]  #5:  (cpu_add_remove_lock){+.+.}, at: [<ffffffff8108106e>]
freeze_secondary_cpus+0x2e/0x390
[   70.297983]  #6:  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff810d660b>]
percpu_down_write+0x2b/0x110
[   70.297986]  #7:  (sparse_irq_lock){+.+.}, at: [<ffffffff810f2187>]
irq_lock_sparse+0x17/0x20
[   70.297990] 
               stack backtrace:
[   70.297992] CPU: 2 PID: 1459 Comm: rtcwake Not tainted
4.14.0-rc1-CI-CI_DRM_3099+ #1
[   70.297993] Hardware name: GIGABYTE GB-BKi7(H)A-7500/MFLP7AP-00, BIOS F4
02/20/2017
[   70.297993] Call Trace:
[   70.297996]  dump_stack+0x68/0x9f
[   70.297998]  print_circular_bug+0x235/0x3c0
[   70.298000]  ? lockdep_init_map_crosslock+0x20/0x20
[   70.298002]  check_prev_add+0x430/0x840
[   70.298004]  __lock_acquire+0x1420/0x15e0
[   70.298006]  ? __lock_acquire+0x1420/0x15e0
[   70.298008]  ? lockdep_init_map_crosslock+0x20/0x20
[   70.298010]  lock_acquire+0xb0/0x200
[   70.298011]  ? wait_for_completion+0x1d/0x20
[   70.298013]  wait_for_common+0x58/0x210
[   70.298014]  ? wait_for_completion+0x1d/0x20
[   70.298015]  ? cpuhp_invoke_callback+0x840/0x840
[   70.298018]  ? stop_machine_cpuslocked+0xc1/0xd0
[   70.298020]  ? cpuhp_invoke_callback+0x840/0x840
[   70.298021]  wait_for_completion+0x1d/0x20
[   70.298022]  takedown_cpu+0x89/0xf0
[   70.298024]  ? cpuhp_complete_idle_dead+0x20/0x20
[   70.298025]  cpuhp_invoke_callback+0xa3/0x840
[   70.298027]  cpuhp_down_callbacks+0x42/0x80
[   70.298028]  _cpu_down+0xb9/0xf0
[   70.298030]  freeze_secondary_cpus+0xa3/0x390
[   70.298032]  suspend_devices_and_enter+0x2fd/0xce0
[   70.298034]  pm_suspend+0x4f0/0x9d0
[   70.298036]  state_store+0x82/0xf0
[   70.298038]  kobj_attr_store+0xf/0x20
[   70.298040]  sysfs_kf_write+0x45/0x60
[   70.298042]  kernfs_fop_write+0x124/0x1c0
[   70.298043]  __vfs_write+0x28/0x130
[   70.298046]  ? rcu_read_lock_sched_held+0x7a/0x90
[   70.298047]  ? rcu_sync_lockdep_assert+0x2f/0x60
[   70.298049]  ? __sb_start_write+0x108/0x200
[   70.298050]  vfs_write+0xcb/0x1c0
[   70.298052]  SyS_write+0x49/0xb0
[   70.298054]  entry_SYSCALL_64_fastpath+0x1c/0xb1
[   70.298055] RIP: 0033:0x7f134a31a290
[   70.298056] RSP: 002b:00007ffd08e34318 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   70.298058] RAX: ffffffffffffffda RBX: ffffffff81492963 RCX:
00007f134a31a290
[   70.298059] RDX: 0000000000000004 RSI: 0000000002141060 RDI:
0000000000000006
[   70.298060] RBP: ffffc9000028ff88 R08: 000000000213edc0 R09:
00007f134a7f6700
[   70.298061] R10: 0000000000000003 R11: 0000000000000246 R12:
000000000213ece0
[   70.298061] R13: 0000000000000001 R14: 0000000000000004 R15:
0000000000000004
[   70.298065]  ? __this_cpu_preempt_check+0x13/0x20
[   70.312646] IRQ 121: no longer affine to CPU2
[   70.312649] IRQ 122: no longer affine to CPU2
[   70.312651] IRQ 124: no longer affine to CPU2
[   70.312658] IRQ 127: no longer affine to CPU2
[   70.312666] IRQ 130: no longer affine to CPU2
[   70.319706] IRQ 8: no longer affine to CPU3
[   70.319713] IRQ 9: no longer affine to CPU3
[   70.319717] IRQ 120: no longer affine to CPU3
[   70.319723] IRQ 125: no longer affine to CPU3
[   70.325294]  cache: parent cpu1 should not be sleeping
[   70.326538]  cache: parent cpu2 should not be sleeping
[   70.327693]  cache: parent cpu3 should not be sleeping
[   70.453203] HDA: we are doing full chip reset now
[   74.251209] Suspending console(s) (use no_console_suspend to debug)
[   74.474977]  cache: parent cpu1 should not be sleeping
[   74.476247]  cache: parent cpu2 should not be sleeping
[   74.477404]  cache: parent cpu3 should not be sleeping
[   74.601393] HDA: we are doing full chip reset now
[   76.606301] [drm:intel_dp_start_link_train [i915]] *ERROR* failed to enable
link training

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are on the CC list for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx-bugs/attachments/20170922/30880bd0/attachment.html>

[Bug 102505] [BAT] igt at chamelium@common-hpd-after-suspend caused *ERROR* failed to enable link training

[Bug 102505] [BAT] igt at chamelium@common-hpd-after-suspend caused ERROR failed to enable link training