[amd-gfx] AMD Carrizo - GPU fault detected: 146 0x0842b714
Mads
mads at ab3.no
Mon Jun 20 08:24:57 UTC 2016
On 2016-06-18 14:30, Nicolai Hähnle wrote:
> The second approach is to correlate the VM ID in
>
>> dmesg:
>> [ 78.873577] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714
>> [ 78.873590] amdgpu 0000:00:01.0:
>> VM_CONTEXT1_PROTECTION_FAULT_ADDR
>> 0x0010151C
>> [ 78.873592] amdgpu 0000:00:01.0:
>> VM_CONTEXT1_PROTECTION_FAULT_STATUS
>> 0x0D0B7014
>> [ 78.873595] VM fault (0x14, vmid 6) at page 1053980, write from
>> 'SDM0' (0x53444d30) (183)
>
> with the running processes. This can be done via tracing. As root:
>
> echo 1 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_cs_ioctl/enable
> echo 1 >
> /sys/kernel/debug/tracing/events/gpu_sched/amd_sched_job/enable
> echo 1 >
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_sched_run_job/enable
> echo 1 >
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_vm_grab_id/enable
> cat /sys/kernel/debug/tracing/trace_pipe
>
> You'll get *lots* of output of the form
>
> compiz-2065 [000] .... 14927.891778: amdgpu_cs_ioctl:
> adev=ffff88022fe70000, sched_job=ffff880110dab2a0, first
> ib=ffff8800923e0200, sched fence=ffff880068509b80, ring name:gfx,
> num_ibs:1
> compiz-2065 [000] .... 14927.891782: amd_sched_job:
> entity=ffff88023258f030, sched job=ffff880110dab2a0,
> fence=ffff880068509b80, ring=gfx, job count:0, hw job count:0
> gfx-172 [002] .... 14927.891802: amdgpu_sched_run_job:
> adev=ffff88022fe70000, sched_job=ffff880110dab2a0, first
> ib=ffff8800923e0200, > sched fence=ffff880068509b80, ring name:gfx,
> num_ibs:1
> gfx-172 [002] .... 14927.891809: amdgpu_vm_grab_id:
> vmid=5, ring=0
>
> In this particular case, compiz submitted a CS (command stream), which
> was then asynchronously sent and processed on the gfx ring with vmid=5.
>
> The idea is to correlate the timestamps with those of the VM fault to
> see which process is at fault. If you do this, please send a bit more
> log context in attachments, because asynchronous execution can
> occasionally make the logs difficult to interpret.
>
I made this script:
> #!/bin/bash
> echo 1 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_cs_ioctl/enable
> echo 1 >
> /sys/kernel/debug/tracing/events/gpu_sched/amd_sched_job/enable
> echo 1 >
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_sched_run_job/enable
> echo 1 >
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_vm_grab_id/enable
> cat /sys/kernel/debug/tracing/trace_pipe >> carrizo.log &
> catpid=$!
> sudo -u htpc XAUTHORITY=/home/htpc/.Xauthority DISPLAY=:0 dolphin &
> dolphinpid=$!
> sleep 3
> echo 0 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_cs_ioctl/enable
> echo 0 >
> /sys/kernel/debug/tracing/events/gpu_sched/amd_sched_job/enable
> echo 0 >
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_sched_run_job/enable
> echo 0 >
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_vm_grab_id/enable
> kill $catpid
> kill $dolphinpid
Attaching the tracelog and dmesg, hope you can make sense of it :)
- Mads
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: carrizo.dmesg
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160620/7043a68a/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: carrizo.log
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160620/7043a68a/attachment-0003.ksh>
More information about the amd-gfx
mailing list