<html> <head> <base href="https://bugs.freedesktop.org/"> </head> <body> <div> <a class="bz_bug_link bz_status_REOPENED " title="REOPENED - nouveau E[ DRM] fail ttm_validate" href="https://bugs.freedesktop.org/show_bug.cgi?id=59069#c38">Comment # 38</a> on <a class="bz_bug_link bz_status_REOPENED " title="REOPENED - nouveau E[ DRM] fail ttm_validate" href="https://bugs.freedesktop.org/show_bug.cgi?id=59069">bug 59069</a> from <a class="email" href="mailto:freedesktop@mattwhitlock.name" title="Matt Whitlock <freedesktop@mattwhitlock.name>"> Matt Whitlock</a> <pre>Created <a href="attachment.cgi?id=123753" name="attach_123753" title="kernel log with ttm_validate, RT_FAULT, ZETA_FAULT, PAGE_NOT_PRESENT">attachment 123753</a> <a href="attachment.cgi?id=123753&action=edit" title="kernel log with ttm_validate, RT_FAULT, ZETA_FAULT, PAGE_NOT_PRESENT">[details]</a> kernel log with ttm_validate, RT_FAULT, ZETA_FAULT, PAGE_NOT_PRESENT The problems also started in earnest for me around the time I upgraded to Plasma 5. Nouveau was never *stable* before then, but I was able to ignore its errors for the most part. Now I can't go more than a few days without X freezing or even the kernel panicking. I do not believe the problems are triggered solely by plasmashell. I most frequently see the "fail ttm_validate" message for kscreenlocker_greet while I am away from my computer. I also very frequently see graphical corruption on the lock screen in the border around my avatar. There are other regressions too. I used to be able to use the XVideo output module in VLC (in fact, it was the only one that was stable). Now, neither XVideo nor OpenGL/GLX will run more than a few frames before the video freezes and "fail ttm_validate" messages spew into the kernel log. The only VLC output module that gives me any stability anymore is VDPAU and only if I disable hardware decoding, but even that will freeze X hard from time to time. The "fail ttm_validate" messages are just the harbinger of impending doom. If I continue without rebooting, eventually I'll be hit by an onslaught of much more ominous errors. Here's a small sampling: May 14 02:53:20 [kernel] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 4 [chrome[21051]] subc 0 mthd 0060 data beef0201 May 14 03:43:11 [kernel] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 6 [kwin_x11[2304]] subc 0 mthd 0060 data beef0201 May 14 05:06:50 [kernel] nouveau 0000:01:00.0: fifo: CACHE_ERROR - ch 1 [DRM] subc 0 mthd 0060 data 80000002 May 14 13:17:52 [kernel] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00000040 [RT_FAULT] - Address 00204c7000 May 14 13:17:52 [kernel] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 00000000, e20: 00001100, e24: 00030000 May 14 13:17:52 [kernel] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - 00000040 [RT_FAULT] - Address 00204c8000 May 14 13:17:52 [kernel] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - e0c: 00000000, e18: 00000000, e1c: 00000010, e20: 00001100, e24: 00030000 May 14 13:17:52 [kernel] nouveau 0000:01:00.0: gr: 00200000 [] ch 11 [000eeda000 plasmashell[2665]] subc 3 class 8297 mthd 1904 data 01000404 May 14 13:17:52 [kernel] nouveau 0000:01:00.0: fb: trapped write at 00204c8000 on channel 11 [0eeda000 plasmashell[2665]] engine 00 [PGRAPH] client 0b [PROP] subclient 00 [RT0] reason 00000002 [PAGE_NOT_PRESENT] May 14 13:17:52 [kernel] nouveau 0000:01:00.0: fb: trapped write at 0020563800 on channel 2 [0fb2f000 X[2086]] engine 00 [PGRAPH] client 0b [PROP] subclient 08 [ZETA] reason 00000002 [PAGE_NOT_PRESENT] May 14 13:18:01 [kernel] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00000020 [ZETA_FAULT] - Address 002054b100 May 14 13:18:01 [kernel] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 00040000, e20: 00020000, e24: 08030000 May 14 13:18:01 [kernel] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - 00000040 [RT_FAULT] - Address 00204f1b00 May 14 13:18:01 [kernel] nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - e0c: 00000000, e18: 00000000, e1c: 006c0110, e20: 00001100, e24: 00030000 May 14 13:18:01 [kernel] nouveau 0000:01:00.0: gr: 00200000 [] ch 11 [000eeda000 plasmashell[2665]] subc 3 class 8297 mthd 1344 data 00004001 May 14 13:18:01 [kernel] nouveau 0000:01:00.0: fb: trapped write at 0020555b00 on channel 11 [0eeda000 plasmashell[2665]] engine 00 [PGRAPH] client 0b [PROP] subclient 08 [ZETA] reason 00000002 [PAGE_NOT_PRESENT] Attached is the complete error log from this session. The problems aren't limited to X, though. When nouveau enters a failure state like this, it corrupts memory belonging to other processes. I have several times (at least thrice) seen bitcoind crash at the same time as this storm of nouveau errors, logging an error message like: 2016-05-14 16:26:37 Corruption: block checksum mismatch 2016-05-14 16:26:37 *** System error while flushing: Database corrupted 2016-05-14 16:26:37 Error: Error: A fatal internal error occurred, see debug.log for details 2016-05-14 16:26:37 Shutdown: done When I started seeing these problems, I suspected bad RAM, so I ran Memtest86+ overnight but found no errors. So my suspicion is that nouveau is writing to pages it shouldn't. Could someone help me modify my kernel so that, instead of merely printing "fail ttm_validate", nouveau sends a SIGBUS to the active process when this occurs? Then I can run plasmashell in gdb and get a clue as to what's causing this.</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>