GPU VM fault in Ubuntu17.10(mesa 17.2.8,kernel 4.15)

Lvzhihong (ReJohn) lvzhihong1 at huawei.com
Fri Feb 2 03:21:24 UTC 2018


Hi all,
       We have gpu VM Fault problem sometimes with newest V4.15 kernel driver and mesa V17.2.8
My gpu is Radeon Pro WX7100

There is a lot of kernel log printed like this:
Jan 31 13:35:44 ubuntu kernel: [120749.990185] amdgpu 000d:31:00.0: GPU fault detected: 146 0x0608d20c
Jan 31 13:35:44 ubuntu kernel: [120749.990192] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001AE2C1
Jan 31 13:35:44 ubuntu kernel: [120749.990195] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x080D200C
Jan 31 13:35:44 ubuntu kernel: [120749.990199] amdgpu 000d:31:00.0: VM fault (0x0c, vmid 4) at page 1761985, read from 'CBC7' (0x43424337) (210)
Jan 31 13:35:44 ubuntu kernel: [120749.990206] amdgpu 000d:31:00.0: GPU fault detected: 146 0x0608e20c
Jan 31 13:35:44 ubuntu kernel: [120749.990208] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001AE2C1
Jan 31 13:35:44 ubuntu kernel: [120749.990210] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x09062014
Jan 31 13:35:44 ubuntu kernel: [120749.990214] amdgpu 000d:31:00.0: VM fault (0x14, vmid 4) at page 1761985, write from 'CBC0' (0x43424330) (98)
Jan 31 13:35:44 ubuntu kernel: [120749.990220] amdgpu 000d:31:00.0: GPU fault detected: 146 0x0608a20c
Jan 31 13:35:44 ubuntu kernel: [120749.990223] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001AE2B1
Jan 31 13:35:44 ubuntu kernel: [120749.990225] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x080A200C
Jan 31 13:35:44 ubuntu kernel: [120749.990228] amdgpu 000d:31:00.0: VM fault (0x0c, vmid 4) at page 1761969, read from 'CBC4' (0x43424334) (162)
Jan 31 13:35:44 ubuntu kernel: [120749.990235] amdgpu 000d:31:00.0: GPU fault detected: 146 0x0608920c
Jan 31 13:35:44 ubuntu kernel: [120749.990238] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001AE2B0
Jan 31 13:35:44 ubuntu kernel: [120749.990240] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x080A200C
Jan 31 13:35:44 ubuntu kernel: [120749.990243] amdgpu 000d:31:00.0: VM fault (0x0c, vmid 4) at page 1761968, read from 'CBC4' (0x43424334) (162)
Jan 31 13:35:44 ubuntu kernel: [120749.990250] amdgpu 000d:31:00.0: GPU fault detected: 146 0x0608620c
Jan 31 13:35:44 ubuntu kernel: [120749.990252] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001AE2BA
Jan 31 13:35:44 ubuntu kernel: [120749.990254] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x080A200C
Jan 31 13:35:44 ubuntu kernel: [120749.990257] amdgpu 000d:31:00.0: VM fault (0x0c, vmid 4) at page 1761978, read from 'CBC4' (0x43424334) (162)
Jan 31 13:35:44 ubuntu kernel: [120749.990264] amdgpu 000d:31:00.0: GPU fault detected: 146 0x0608520c
Jan 31 13:35:44 ubuntu kernel: [120749.990266] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001AE2C1
Jan 31 13:35:44 ubuntu kernel: [120749.990269] amdgpu 000d:31:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0805200C

And I see the GPU has hung,the gpu load keep at 100% , the X server can not restart anymore.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180202/16edcede/attachment.html>


More information about the amd-gfx mailing list