[amd-gfx] AMD Carrizo - GPU fault detected: 146 0x0842b714

Tue Jun 21 07:39:10 UTC 2016

On 20.06.2016 22:02, Mads wrote:
> On 2016-06-20 20:36, Mads wrote:
>
>>> Unfortunately there seems to be a line in your dmesg that the
>>> Mesa-internal parser didn't understand - that's what the Assertion
>>> message is about, and it's why you don't see any dump files. I've
>>> updated the branch at
>>> https://cgit.freedesktop.org/~nh/mesa/log/?h=debug-dma with a patch
>>> to work around this. Please retry with that one.
>>>
>>> Also, please make sure that your X server really uses the manually
>>> built version of radeonsi_dri.so and has R600_DEBUG=check_vm set.
>>> That assertion should have taken down your X server before you even
>>> had a chance to start dolphin.
>>
>> It did, you can see in the log that it says "Killed"
>>
>> I ran dolphin from ssh, that's why I could show you the log.
>>
>> Building mesa again with the patch now, will report back :)
>>
>
> There!
>
>> $ XAUTHORITY=.Xauthority DISPLAY=:0 LIBGL_DEBUG=verbose dolphin
>> libGL: pci id for fd 9: 1002:9874, driver radeonsi
>> libGL: OpenDriver: trying /usr/lib64/dri/tls/radeonsi_dri.so
>> libGL: OpenDriver: trying /usr/lib64/dri/radeonsi_dri.so
>> si_vm_fault_occured: failed to parse line '                Either
>> enable ECC checking or force module loading by setting
>> 'ecc_enable_override'.
> '
>> libGL: Using DRI3 for screen 0
>> Trying to convert empty KLocalizedString to QString.
>> Cannot creat accessible child interface for object:
>> PlacesView(0x189f230)  index:  5
>> QPixmap::scaled: Pixmap is a null pixmap
>> QPixmap::scaled: Pixmap is a null pixmap
>
> That line it can't parse is from the built-in CONFIG_EDAC_AMD64, if
> that's interesting...
>
> And, X didn't crash, but plasmashell did! Attaching the ddebug_dump from
> it (the same kind of protection bug and corruption appeared).

Thanks. However, I still don't think this is going to help. Your earlier 
trace experiments showed that the problematic SDMA commands came from 
the X server, _not_ from plasmashell.

So what we see here is likely just the first set of GPU commands sent by 
plasmashell after the VM fault occurred. Since the plasmashell process 
is unable to tell who caused the VM fault, it takes the blame 
incorrectly. Are you sure the X server is using your self-compiled 
radeonsi_dri.so and has the environment variable set? If it creates a 
ddebug_dump, it might be somewhere else (it's based off the HOME 
environment variable, which may be different).

Thanks,
Nicolai

>
> - Mads