[amd-gfx] AMD Carrizo - GPU fault detected: 146 0x0842b714
Mads
mads at ab3.no
Sat Jun 18 11:16:00 UTC 2016
That was quick! :)
On 2016-06-18 11:28, Nicolai Hähnle wrote:
>
> Since you've tried a lot of kernel variations, I'm tempted to look for
> the problem in Mesa. A couple of things you could try:
>
> 1) Run R600_DEBUG=testdma,check_vm glxgears (or any other GL app,
> really). This executes a DMA self-test. Observe whether there are any
> failures and whether you get VM faults associated to the run in dmesg.
> (The self-test runs indefinitely, until you Ctrl+C out of it.)
last line before ctrl+c:
342: dst = ( 80 x 104 x 1, 2D_TILED_THIN1), src = ( 1164 x 1940 x
1, 2D_TILED_THIN1), bpp = 16, BLITs: GFX = 30, DMA = 0, pass [343/343]
It didn't seem to cause any issues, no messages in dmesg...
> 2) Start your desktop session with R600_DEBUG=nodma and see if that
> makes the VM faults go away. (Please make sure that the environment
> variable actually makes it through, by looking at /proc/$pid/environ,
> where $pid is the PID of kwin and other relevant processes.)
It set it globally, and I could see krunner's environ-file containing
R600_DEBUG=nodma. Still corruption, graphical lock up and this output
from dmesg after starting dolphin:
[ 1188.562864] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.562870] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00101508
[ 1188.562872] amdgpu 0000:00:01.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.562875] VM fault (0x14, vmid 6) at page 1053960, write from
'SDM0' (0x53444d30) (183)
[ 1188.562879] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.562881] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x0010151F
[ 1188.562883] amdgpu 0000:00:01.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.562885] VM fault (0x14, vmid 6) at page 1053983, write from
'SDM0' (0x53444d30) (183)
[ 1188.565159] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.565165] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00101508
[ 1188.565168] amdgpu 0000:00:01.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.565170] VM fault (0x14, vmid 6) at page 1053960, write from
'SDM0' (0x53444d30) (183)
[ 1188.565176] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.565178] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x0010150A
[ 1188.565180] amdgpu 0000:00:01.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.565182] VM fault (0x14, vmid 6) at page 1053962, write from
'SDM0' (0x53444d30) (183)
[ 1188.565187] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714
[ 1188.565189] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00101508
[ 1188.565191] amdgpu 0000:00:01.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.565193] VM fault (0x14, vmid 6) at page 1053960, write from
'SDM0' (0x53444d30) (183)
[ 1188.572882] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.572887] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00101508
[ 1188.572888] amdgpu 0000:00:01.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.572890] VM fault (0x14, vmid 6) at page 1053960, write from
'SDM0' (0x53444d30) (183)
[ 1188.572895] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.572896] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x0010150A
[ 1188.572897] amdgpu 0000:00:01.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.572898] VM fault (0x14, vmid 6) at page 1053962, write from
'SDM0' (0x53444d30) (183)
[ 1188.572902] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714
[ 1188.572903] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR
0x00101508
[ 1188.572904] amdgpu 0000:00:01.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.572905] VM fault (0x14, vmid 6) at page 1053960, write from
'SDM0' (0x53444d30) (183)
> 3) Do dolphin and konsole use OpenGL directly in your setting, or is it
> just the compositor?
>
I don't think they're special...? I wouldn't know where to setup that
kind of setting, so I'm guessing it's the compositor.
> 4) Something else I notice is that the page numbers of the VM faults
> are of the form 0x001xxxxx. This suggest a 32-bit address underflow,
> i.e. an address wraps around to a very large 32-bit number. Could you
> please install a version of Mesa with assertions enabled
> (--enable-debug in ./configure does the trick) and see if some check is
> triggered?
I'll do this next, it takes a while to build so I'll reply as soon as I
have it :) It is a 64 bit system though, but I have both 64bit libs and
32bits libs installed (I can't think of anything that should be running
that would be 32-bit...)
Thanks for help!
- Mads
More information about the amd-gfx
mailing list