[amd-gfx] AMD Carrizo - GPU fault detected: 146 0x0842b714

Nicolai Hähnle nhaehnle at gmail.com
Mon Jun 20 16:00:05 UTC 2016


On 20.06.2016 17:50, Mads wrote:
>> On 2016-06-20 11:09, Nicolai Hähnle wrote:
>>
>>> Thanks for the effort. The apitrace of Dolphin is indeed "useless" --
>>> seems like OpenGL is loaded, but in the end the app decides not to
>>> use it. Instead, it looks like the VM faults are coming from the X
>>> server.
>>>
>>> Can you make sure that the X server loads the debug build of
>>> radeonsi_dri.so with assertions enabled?
>>>
>>> I wonder if it's possible to get an apitrace from the X server.
>>> Perhaps you can reproduce the problem with Xephyr? If that also shows
>>> the VM faults, it would probably be easiest.
>>>
>>> Nicolai
>>
>> Just so you know, the system is running xorg-server 1.18.3 and
>> xf86-video-amdgpu-1.1.0 with DRI3 + xf86-input-libinput-0.19.0.
>>
>> I'm rebuilding LLVM and mesa now with debug enabled just to make sure
>> my environment is sane (LLVM didn't have debug/assertions enabled),
>> but that'll take a while... be right back with Xephyr testing when
>> that's done.
>>
>
> Trip report:
>
> 1) If I start Xephyr like this: "DISPLAY=:0 Xephyr -auth .Xauthority :1
> -screen 800x600", there's alot of corruption on the whole screen (and
> the Xephyr window does not show), but the corruptions looks slightly
> different than usual (lots of black bars in a checkers like formation).
> Also, the VM_CONTEXT1_PROTECTION_FAULT_ADDR-error does not trigger!
> Tried with DRI2 and DRI3, same results (and type of corruption) with both.
>
> 2) If I start Xephyr like this: "DISPLAY=:0 Xephyr -auth .Xauthority :1
> -ac -screen 800x600 -glamor" (with glamor that is), the Xephyr window
> works (but the KDE window decorations around/belonging to the Xephyr
> window looks corrupted). Both with DRI2 and DRI3, window decoration
> corruption with both. No triggering of protection faults here either.
>
> If I then, after starting Xephyr with -glamor, start dolphin like this:
> "$ DISPLAY=:1 LIBGL_DEBUG=verbose dolphin" I get this log:
>
>> libGL: OpenDriver: trying /usr/lib64/dri/tls/swrast_dri.so
>> libGL: OpenDriver: trying /usr/lib64/dri/swrast_dri.so
>> Trying to convert empty KLocalizedString to QString.
>> Cannot creat accessible child interface for object:
>> PlacesView(0x959f50)  index:  5
>> QPixmap::scaled: Pixmap is a null pixmap
>> QPixmap::scaled: Pixmap is a null pixmap
>> (... etc ...)
>
> And the dolphin window works inside Xephyr without any corruption, and
> no messages in dmesg.
>
> And just to double-check, starting dolphin on :0 yields, yet again, a
> terrible mess on screen, and VM faults in dmesg - same as before.

Okay, so clearly the "main" X server behaves differently from the nested 
one. That would have been too easy :)

Could you please try with Mesa from 
https://cgit.freedesktop.org/~nh/mesa/log/?h=debug-dma (at least the top 
three commits) and running the X server with the R600_DEBUG=check_vm 
variable set?

That should kill the X server on the first VM fault and result in a file 
$HOME/ddebug_dump/X_$pid_00000000 that contains the last submitted DMA 
commands, that should help to figure out what's going on.


> assertions are enabled on mesa and llvm, but I haven't disabled -O2 and
> stripping of debug info yet.. should I do that next, so it'll be easier
> to run through gdb?
>
> (I have no qualms of giving you shell access of this machine if you want
> to have a look around...)

Sure, that might be useful as well.

Nicolai

>
> - Mads


More information about the amd-gfx mailing list