[Nouveau] [Bug 100567] Nouveau system freeze fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Jan 5 21:29:37 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=100567

--- Comment #18 from kenorb at gmail.com ---
The same problem on Ubuntu 18.10, kernel 4.18.0-13.

I've got 4x GPU: GTX 1080 Ti (3-Way SLI Connector), NVIDIA GeForce GTX 1080 Ti
graphics card with 3584 cores.

$ uname -a
Linux Ubuntu-PC 4.18.0-13-generic #14-Ubuntu SMP Wed Dec 5 09:04:24 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux

Errors in kern.log file:

nouveau 0000:65:00.0: fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]
nouveau 0000:65:00.0: fifo: runlist 0: scheduled for recovery
nouveau 0000:65:00.0: fifo: channel 2: killed
nouveau 0000:65:00.0: fifo: engine 0: scheduled for recovery
nouveau 0000:65:00.0: Xorg[5447]: channel 2 killed!
nouveau 0000:65:00.0: systemd-logind[3394]: nv50cal_space: -16
nouveau 0000:65:00.0: systemd-logind[3394]: nv50cal_space: -16
(the same message repeated 800x over and over again)

The system got freeze (no mouse or keyboard reaction), however kernel reacted
on few Magic SysRq keys, so here are some stack traces:

INFO: task kworker/u72:8:492 blocked for more than 120 seconds.
      Tainted: G           O      4.18.0-13-generic #14-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u72:8   D    0   492      2 0x80000000
Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau]

Call Trace at 20:25:50:
 __schedule+0x29e/0x840
 schedule+0x2c/0x80
 schedule_timeout+0x258/0x360
 ? nv50_wndw_atomic_destroy_state+0x1d/0x20 [nouveau]
 dma_fence_default_wait+0x1fc/0x260
 ? dma_fence_release+0xa0/0xa0
 dma_fence_wait_timeout+0x3e/0xf0
 drm_atomic_helper_wait_for_fences+0x3f/0xc0 [drm_kms_helper]
 nv50_disp_atomic_commit_tail+0x78/0x860 [nouveau]
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
 process_one_work+0x20f/0x3c0
 worker_thread+0x34/0x400
 kthread+0x120/0x140
 ? pwq_unbound_release_workfn+0xd0/0xd0
 ? kthread_bind+0x40/0x40
 ret_from_fork+0x35/0x40

Same call trace at 20:29:51 (few minutes later while Xorg was frozen):
Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau]
Call Trace:
 __schedule+0x29e/0x840
 ? apic_timer_interrupt+0xa/0x20
 ? __drm_crtc_commit_free+0x12/0x20 [drm]
 schedule+0x2c/0x80
 schedule_timeout+0x258/0x360
 ? nv50_wndw_atomic_destroy_state+0x1d/0x20 [nouveau]
 dma_fence_default_wait+0x1fc/0x260
 ? dma_fence_release+0xa0/0xa0
 dma_fence_wait_timeout+0x3e/0xf0
 drm_atomic_helper_wait_for_fences+0x3f/0xc0 [drm_kms_helper]
 nv50_disp_atomic_commit_tail+0x78/0x860 [nouveau]
 ? __switch_to_asm+0x40/0x70
 ? __switch_to_asm+0x34/0x70
 nv50_disp_atomic_commit_work+0x12/0x20 [nouveau]
 process_one_work+0x20f/0x3c0
 worker_thread+0x34/0x400
 kthread+0x120/0x140
 ? pwq_unbound_release_workfn+0xd0/0xd0
 ? kthread_bind+0x40/0x40
 ret_from_fork+0x35/0x40

Another one:
INFO: task Xorg:5447 blocked for more than 120 seconds.
      Tainted: G           O      4.18.0-13-generic #14-Ubuntu
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Xorg            D    0  5447   5445 0x00000004
Call Trace:
 __schedule+0x29e/0x840
 schedule+0x2c/0x80
 schedule_preempt_disabled+0xe/0x10
 __ww_mutex_lock.isra.6+0x3c1/0x660
 __ww_mutex_lock_slowpath+0x16/0x20
 ww_mutex_lock+0x34/0x50
 drm_modeset_lock+0x6e/0xb0 [drm]
 drm_crtc_get_sequence_ioctl+0xbc/0x190 [drm]
 ? drm_wait_vblank_ioctl+0x610/0x610 [drm]
 drm_ioctl_kernel+0xa4/0xf0 [drm]
 drm_ioctl+0x227/0x400 [drm]
 ? drm_wait_vblank_ioctl+0x610/0x610 [drm]
 ? do_iter_write+0xe1/0x1a0
 ? do_iter_write+0xe1/0x1a0
 nouveau_drm_ioctl+0x73/0xc0 [nouveau]
 do_vfs_ioctl+0xa8/0x620
 ? __sys_recvmsg+0x88/0xa0
 ksys_ioctl+0x67/0x90
 __x64_sys_ioctl+0x1a/0x20
 do_syscall_64+0x5a/0x110
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f3f654b93c7
Code: Bad RIP value.
RSP: 002b:00007ffd57bbf168 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffd57bbf200 RCX: 00007f3f654b93c7
RDX: 00007ffd57bbf1a0 RSI: 00000000c018643b RDI: 000000000000000e
RBP: 00007ffd57bbf1a0 R08: 0000000000000000 R09: 00005646eb8ff7c0
R10: 00005646eb54ad30 R11: 0000000000000246 R12: 00000000c018643b
R13: 000000000000000e R14: 00005646eb54b800 R15: 00005646eb466880

Full log: https://gist.github.com/kenorb/5b95caa1694dbf7f030ccc808a110856

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/nouveau/attachments/20190105/e8c42fd3/attachment.html>


More information about the Nouveau mailing list