[amd-gfx] AMD Carrizo - GPU fault detected: 146 0x0842b714

Nicolai Hähnle nhaehnle at gmail.com
Sat Jun 18 09:28:24 UTC 2016


On 18.06.2016 10:15, Mads wrote:
> Hi!
>
> For a while now I've been having issues with my HP EliteDesk 705 G2 mini
> PC[1].
>
> If I open up e.g. dolphin or konsole when in kde plasma 5.6.4, the
> screen corrupts and locks up, and this appears in dmesg:
>
> juni 17 22:50:42 hphtpc kernel: amdgpu 0000:00:01.0: GPU fault detected:
> 146 0x0842b714
> juni 17 22:50:42 hphtpc kernel: amdgpu 0000:00:01.0:
> VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00101508
> juni 17 22:50:42 hphtpc kernel: amdgpu 0000:00:01.0:
> VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B0B7014
> juni 17 22:50:42 hphtpc kernel: VM fault (0x14, vmid 5) at page 1053960,
> write from 'SDM0' (0x53444d30) (183)
... snip ...

> This didn't happen back with mesa-11.2.2 built against llvm 3.8.0, but
> that starts to be quite a lot of commits ago now, considering the
> development pace mesa's got at the moment.
>
> I tried out mesa and llvm from git and svn around when Bas Nieuwenhuizen
> posted those GL compute shaders for radeonsi patches[2], and I think
> that's when it was the first time I saw the bug.

Compute shaders aren't used by a plain desktop, but the VM fault 
indicates a write from the SDMA engine, which also saw a lot more use 
during that timeframe.


> It seems that the bug appears no matter what kernel I try to use, I've
> been through countless iterations of drm-next-4.7 kernels and
> drm-fixes-4.6 kernels, but it seems to happen no matter what I use. The
> error message pasted above comes from gentoo provided 4.6.2-kernel:
>
> # uname -a
> Linux hphtpc 4.6.2-gentoo #2 SMP PREEMPT Mon Jun 13 21:27:32 CEST 2016
> x86_64 AMD PRO A12-8800B R7, 12 Compute Cores 4C+8G AuthenticAMD GNU/Linux
>
> Am I at the right mailing list for this kind of bug? How can I debug
> this further?

Since you've tried a lot of kernel variations, I'm tempted to look for 
the problem in Mesa. A couple of things you could try:

1) Run R600_DEBUG=testdma,check_vm glxgears (or any other GL app, 
really). This executes a DMA self-test. Observe whether there are any 
failures and whether you get VM faults associated to the run in dmesg. 
(The self-test runs indefinitely, until you Ctrl+C out of it.)

2) Start your desktop session with R600_DEBUG=nodma and see if that 
makes the VM faults go away. (Please make sure that the environment 
variable actually makes it through, by looking at /proc/$pid/environ, 
where $pid is the PID of kwin and other relevant processes.)

3) Do dolphin and konsole use OpenGL directly in your setting, or is it 
just the compositor?

4) Something else I notice is that the page numbers of the VM faults are 
of the form 0x001xxxxx. This suggest a 32-bit address underflow, i.e. an 
address wraps around to a very large 32-bit number. Could you please 
install a version of Mesa with assertions enabled (--enable-debug in 
./configure does the trick) and see if some check is triggered?

Nicolai

>
> - Mads
>
> ---------
> [1]
> http://store.hp.com/us/en/PDPStdView?catalogId=10051&urlLangId=-1&langId=-1&productId=1086676&storeId=10151
>
> [2] https://lists.freedesktop.org/archives/mesa-dev/2016-April/111638.html
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list