[Bug 107152] GPU fault detected: 146 / VM_CONTEXT1_PROTECTION_FAULT / ring gfx timeout
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Aug 3 16:54:28 UTC 2018
https://bugs.freedesktop.org/show_bug.cgi?id=107152
--- Comment #8 from Andrey Grodzovsky <andrey.grodzovsky at amd.com> ---
dwanger, i think you already have all the trace tools installed from previous
debug sessions so this should be quick for you -
Update to latest kernel from
https://cgit.freedesktop.org/~agd5f/linux/log/?h=amd-staging-drm-next
Load the system and before starting reproduce run the following trace command -
sudo trace-cmd start -e dma_fence -e gpu_scheduler -e amdgpu -v -e
"amdgpu:amdgpu_mm_rreg" -e "amdgpu:amdgpu_mm_wreg" -e "amdgpu:amdgpu_iv"
after VM_FAULT happened extract the log from /sys/kernel/debug/tracing
also run
sudo umr -O verbose -R gfx[.]
sudo umr -O halt_waves -wa
Now let's say this your log crash
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00100190
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C
Jul 07 01:08:20 ryzen kernel: amdgpu 0000:0a:00.0: VM fault (0x0c, vmid 7,
pasid 32768) at page 1048976, read from 'TC1' (0x54433100) (68)
Do
umr -O verbose -vm 7 at 100190000 1
where 7 is vmid value and 100190000 is VM_CONTEXT1_PROTECTION_FAULT_ADDR value
with extra '000' to get from virtual page number to actual virtual address
(left shift 4096b).
I can look at the log then and also run it by our MESA/LLVM experts to try and
figure out what's going on.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180803/06da76f5/attachment.html>
More information about the dri-devel
mailing list