[amd-gfx] AMD Carrizo - GPU fault detected: 146 0x0842b714

Nicolai Hähnle nhaehnle at gmail.com
Sat Jun 18 11:36:51 UTC 2016


On 18.06.2016 13:16, Mads wrote:
> That was quick! :)
>
> On 2016-06-18 11:28, Nicolai Hähnle wrote:
>>
>> Since you've tried a lot of kernel variations, I'm tempted to look for
>> the problem in Mesa. A couple of things you could try:
>>
>> 1) Run R600_DEBUG=testdma,check_vm glxgears (or any other GL app,
>> really). This executes a DMA self-test. Observe whether there are any
>> failures and whether you get VM faults associated to the run in dmesg.
>> (The self-test runs indefinitely, until you Ctrl+C out of it.)
>
> last line before ctrl+c:
>   342: dst = (   80 x   104 x 1, 2D_TILED_THIN1),  src = ( 1164 x  1940
> x 1, 2D_TILED_THIN1), bpp = 16, BLITs: GFX = 30, DMA =  0, pass [343/343]
>
> It didn't seem to cause any issues, no messages in dmesg...
>
>> 2) Start your desktop session with R600_DEBUG=nodma and see if that
>> makes the VM faults go away. (Please make sure that the environment
>> variable actually makes it through, by looking at /proc/$pid/environ,
>> where $pid is the PID of kwin and other relevant processes.)
>
> It set it globally, and I could see krunner's environ-file containing
> R600_DEBUG=nodma.  Still corruption, graphical lock up and this output
> from dmesg after starting dolphin:
>
> [ 1188.562864] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
> [ 1188.562870] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR
>   0x00101508
> [ 1188.562872] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
> 0x0D0B7014
> [ 1188.562875] VM fault (0x14, vmid 6) at page 1053960, write from
> 'SDM0' (0x53444d30) (183)
.. snip ..

That's surprising. Would CP DMA also appear as write from 'SDM0'? I 
doubt it...


>> 3) Do dolphin and konsole use OpenGL directly in your setting, or is
>> it just the compositor?
>>
> I don't think they're special...? I wouldn't know where to setup that
> kind of setting, so I'm guessing it's the compositor.

A sanity check is `grep radeonsi /proc/$pid/maps` -- if something shows 
up, the driver was loaded into the process.


>> 4) Something else I notice is that the page numbers of the VM faults
>> are of the form 0x001xxxxx. This suggest a 32-bit address underflow,
>> i.e. an address wraps around to a very large 32-bit number. Could you
>> please install a version of Mesa with assertions enabled
>> (--enable-debug in ./configure does the trick) and see if some check
>> is triggered?
>
> I'll do this next, it takes a while to build so I'll reply as soon as I
> have it :) It is a 64 bit system though, but I have both 64bit libs and
> 32bits libs installed (I can't think of anything that should be running
> that would be 32-bit...)

That doesn't really matter though. Even though the system is 64 bits and 
the GPUVM has a 40 bit address space, the GPU still takes plenty of 
address-related offsets as 32 bits or less.

Cheers,
Nicolai

>
> Thanks for help!
>
> - Mads


More information about the amd-gfx mailing list