[amd-gfx] AMD Carrizo - GPU fault detected: 146 0x0842b714

Mads mads at ab3.no
Mon Jun 20 08:24:57 UTC 2016


On 2016-06-18 14:30, Nicolai Hähnle wrote:

> The second approach is to correlate the VM ID in
> 
>> dmesg:
>> [   78.873577] amdgpu 0000:00:01.0: GPU fault detected: 146 0x08e2b714
>> [   78.873590] amdgpu 0000:00:01.0:   
>> VM_CONTEXT1_PROTECTION_FAULT_ADDR
>> 0x0010151C
>> [   78.873592] amdgpu 0000:00:01.0: 
>> VM_CONTEXT1_PROTECTION_FAULT_STATUS
>> 0x0D0B7014
>> [   78.873595] VM fault (0x14, vmid 6) at page 1053980, write from
>> 'SDM0' (0x53444d30) (183)
> 
> with the running processes. This can be done via tracing. As root:
> 
> echo 1 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_cs_ioctl/enable
> echo 1 > 
> /sys/kernel/debug/tracing/events/gpu_sched/amd_sched_job/enable
> echo 1 > 
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_sched_run_job/enable
> echo 1 > 
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_vm_grab_id/enable
> cat /sys/kernel/debug/tracing/trace_pipe
> 
> You'll get *lots* of output of the form
> 
>           compiz-2065  [000] .... 14927.891778: amdgpu_cs_ioctl: 
> adev=ffff88022fe70000, sched_job=ffff880110dab2a0, first 
> ib=ffff8800923e0200, sched fence=ffff880068509b80, ring name:gfx, 
> num_ibs:1
>           compiz-2065  [000] .... 14927.891782: amd_sched_job: 
> entity=ffff88023258f030, sched job=ffff880110dab2a0, 
> fence=ffff880068509b80, ring=gfx, job count:0, hw job count:0
>              gfx-172   [002] .... 14927.891802: amdgpu_sched_run_job: 
> adev=ffff88022fe70000, sched_job=ffff880110dab2a0, first 
> ib=ffff8800923e0200, > sched fence=ffff880068509b80, ring name:gfx, 
> num_ibs:1
>              gfx-172   [002] .... 14927.891809: amdgpu_vm_grab_id: 
> vmid=5, ring=0
> 
> In this particular case, compiz submitted a CS (command stream), which 
> was then asynchronously sent and processed on the gfx ring with vmid=5.
> 
> The idea is to correlate the timestamps with those of the VM fault to 
> see which process is at fault. If you do this, please send a bit more 
> log context in attachments, because asynchronous execution can 
> occasionally make the logs difficult to interpret.
> 

I made this script:

> #!/bin/bash
> echo 1 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_cs_ioctl/enable
> echo 1 > 
> /sys/kernel/debug/tracing/events/gpu_sched/amd_sched_job/enable
> echo 1 > 
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_sched_run_job/enable
> echo 1 > 
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_vm_grab_id/enable
> cat /sys/kernel/debug/tracing/trace_pipe >> carrizo.log &
> catpid=$!
> sudo -u htpc XAUTHORITY=/home/htpc/.Xauthority DISPLAY=:0 dolphin &
> dolphinpid=$!
> sleep 3
> echo 0 > /sys/kernel/debug/tracing/events/amdgpu/amdgpu_cs_ioctl/enable
> echo 0 > 
> /sys/kernel/debug/tracing/events/gpu_sched/amd_sched_job/enable
> echo 0 > 
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_sched_run_job/enable
> echo 0 > 
> /sys/kernel/debug/tracing/events/amdgpu/amdgpu_vm_grab_id/enable
> kill $catpid
> kill $dolphinpid

Attaching the tracelog and dmesg, hope you can make sense of it :)

- Mads
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: carrizo.dmesg
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160620/7043a68a/attachment-0002.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: carrizo.log
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160620/7043a68a/attachment-0003.ksh>


More information about the amd-gfx mailing list