[amd-gfx] AMD Carrizo - GPU fault detected: 146 0x0842b714

Mads mads at ab3.no
Sat Jun 18 11:16:00 UTC 2016


That was quick! :)

On 2016-06-18 11:28, Nicolai Hähnle wrote:
> 
> Since you've tried a lot of kernel variations, I'm tempted to look for 
> the problem in Mesa. A couple of things you could try:
> 
> 1) Run R600_DEBUG=testdma,check_vm glxgears (or any other GL app, 
> really). This executes a DMA self-test. Observe whether there are any 
> failures and whether you get VM faults associated to the run in dmesg. 
> (The self-test runs indefinitely, until you Ctrl+C out of it.)

last line before ctrl+c:
  342: dst = (   80 x   104 x 1, 2D_TILED_THIN1),  src = ( 1164 x  1940 x 
1, 2D_TILED_THIN1), bpp = 16, BLITs: GFX = 30, DMA =  0, pass [343/343]

It didn't seem to cause any issues, no messages in dmesg...

> 2) Start your desktop session with R600_DEBUG=nodma and see if that 
> makes the VM faults go away. (Please make sure that the environment 
> variable actually makes it through, by looking at /proc/$pid/environ, 
> where $pid is the PID of kwin and other relevant processes.)

It set it globally, and I could see krunner's environ-file containing 
R600_DEBUG=nodma. Still corruption, graphical lock up and this output 
from dmesg after starting dolphin:

[ 1188.562864] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.562870] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
  0x00101508
[ 1188.562872] amdgpu 0000:00:01.0:   
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.562875] VM fault (0x14, vmid 6) at page 1053960, write from 
'SDM0' (0x53444d30) (183)
[ 1188.562879] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.562881] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
  0x0010151F
[ 1188.562883] amdgpu 0000:00:01.0:   
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.562885] VM fault (0x14, vmid 6) at page 1053983, write from 
'SDM0' (0x53444d30) (183)
[ 1188.565159] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.565165] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
  0x00101508
[ 1188.565168] amdgpu 0000:00:01.0:   
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.565170] VM fault (0x14, vmid 6) at page 1053960, write from 
'SDM0' (0x53444d30) (183)
[ 1188.565176] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.565178] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
  0x0010150A
[ 1188.565180] amdgpu 0000:00:01.0:   
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.565182] VM fault (0x14, vmid 6) at page 1053962, write from 
'SDM0' (0x53444d30) (183)
[ 1188.565187] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714
[ 1188.565189] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
  0x00101508
[ 1188.565191] amdgpu 0000:00:01.0:   
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.565193] VM fault (0x14, vmid 6) at page 1053960, write from 
'SDM0' (0x53444d30) (183)
[ 1188.572882] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.572887] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
  0x00101508
[ 1188.572888] amdgpu 0000:00:01.0:   
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.572890] VM fault (0x14, vmid 6) at page 1053960, write from 
'SDM0' (0x53444d30) (183)
[ 1188.572895] amdgpu 0000:00:01.0: GPU fault detected: 146 0x0842b714
[ 1188.572896] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
  0x0010150A
[ 1188.572897] amdgpu 0000:00:01.0:   
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.572898] VM fault (0x14, vmid 6) at page 1053962, write from 
'SDM0' (0x53444d30) (183)
[ 1188.572902] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714
[ 1188.572903] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
  0x00101508
[ 1188.572904] amdgpu 0000:00:01.0:   
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0D0B7014
[ 1188.572905] VM fault (0x14, vmid 6) at page 1053960, write from 
'SDM0' (0x53444d30) (183)

> 3) Do dolphin and konsole use OpenGL directly in your setting, or is it 
> just the compositor?
> 
I don't think they're special...? I wouldn't know where to setup that 
kind of setting, so I'm guessing it's the compositor.

> 4) Something else I notice is that the page numbers of the VM faults 
> are of the form 0x001xxxxx. This suggest a 32-bit address underflow, 
> i.e. an address wraps around to a very large 32-bit number. Could you 
> please install a version of Mesa with assertions enabled 
> (--enable-debug in ./configure does the trick) and see if some check is 
> triggered?

I'll do this next, it takes a while to build so I'll reply as soon as I 
have it :) It is a 64 bit system though, but I have both 64bit libs and 
32bits libs installed (I can't think of anything that should be running 
that would be 32-bit...)

Thanks for help!

- Mads


More information about the amd-gfx mailing list