[amd-gfx] AMD Carrizo - GPU fault detected: 146 0x0842b714
Nicolai Hähnle
nhaehnle at gmail.com
Sat Jun 18 11:36:51 UTC 2016
On 18.06.2016 13:16, Mads wrote:
> That was quick! :)
>
> On 2016-06-18 11:28, Nicolai Hähnle wrote:
>>
>> Since you've tried a lot of kernel variations, I'm tempted to look for
>> the problem in Mesa. A couple of things you could try:
>>
>> 1) Run R600_DEBUG=testdma,check_vm glxgears (or any other GL app,
>> really). This executes a DMA self-test. Observe whether there are any
>> failures and whether you get VM faults associated to the run in dmesg.
>> (The self-test runs indefinitely, until you Ctrl+C out of it.)
>
> last line before ctrl+c:
> 342: dst = ( 80 x 104 x 1, 2D_TILED_THIN1), src = ( 1164 x 1940
> x 1, 2D_TILED_THIN1), bpp = 16, BLITs: GFX = 30, DMA = 0, pass [343/343]
>
> It didn't seem to cause any issues, no messages in dmesg...
>
>> 2) Start your desktop session with R600_DEBUG=nodma and see if that
>> makes the VM faults go away. (Please make sure that the environment
>> variable actually makes it through, by looking at /proc/$pid/environ,
>> where $pid is the PID of kwin and other relevant processes.)
>
> It set it globally, and I could see krunner's environ-file containing
> R600_DEBUG=nodma. Still corruption, graphical lock up and this output
> from dmesg after starting dolphin:
>
> [ 1188.562864] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
> [ 1188.562870] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
> 0x00101508
> [ 1188.562872] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS
> 0x0D0B7014
> [ 1188.562875] VM fault (0x14, vmid 6) at page 1053960, write from
> 'SDM0' (0x53444d30) (183)
.. snip ..
That's surprising. Would CP DMA also appear as write from 'SDM0'? I
doubt it...
>> 3) Do dolphin and konsole use OpenGL directly in your setting, or is
>> it just the compositor?
>>
> I don't think they're special...? I wouldn't know where to setup that
> kind of setting, so I'm guessing it's the compositor.
A sanity check is `grep radeonsi /proc/$pid/maps` -- if something shows
up, the driver was loaded into the process.
>> 4) Something else I notice is that the page numbers of the VM faults
>> are of the form 0x001xxxxx. This suggest a 32-bit address underflow,
>> i.e. an address wraps around to a very large 32-bit number. Could you
>> please install a version of Mesa with assertions enabled
>> (--enable-debug in ./configure does the trick) and see if some check
>> is triggered?
>
> I'll do this next, it takes a while to build so I'll reply as soon as I
> have it :) It is a 64 bit system though, but I have both 64bit libs and
> 32bits libs installed (I can't think of anything that should be running
> that would be 32-bit...)
That doesn't really matter though. Even though the system is 64 bits and
the GPUVM has a 40 bit address space, the GPU still takes plenty of
address-related offsets as 32 bits or less.
Cheers,
Nicolai
>
> Thanks for help!
>
> - Mads
More information about the amd-gfx
mailing list