[Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Aug 21 08:41:52 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #54 from dwagner <jb5sgc1n.nya at 20mm.eu> ---
(In reply to Andrey Grodzovsky from comment #53)
> Created attachment 141198 [details] [review]
> add_debug_info2.patch
> 
> Try this patch instead, i might be missing some prints in the first one.

Can try that this evening.

> In the last log you attached I haven't seen any UMR dumps or GPU fault
> prints in dmesg. THe GPU fault has to be in the log to compare the faulty
> address against the debug prints in the patch.

In above attached file "xz-compressed output of gpu_debug3.sh" there is umr
output at the time of the crash (238 seconds after the reboot):

----------------------------------------------
...
          mpv/vo-897   [005] ....   235.191542: dma_fence_wait_start:
driver=drm_sched timeline=gfx context=162 seqno=87
          mpv/vo-897   [005] d...   235.191548: dma_fence_enable_signal:
driver=drm_sched timeline=gfx context=162 seqno=87
     kworker/0:2-92    [000] ....   238.275988: dma_fence_signaled:
driver=amdgpu timeline=sdma1 context=11 seqno=210
     kworker/0:2-92    [000] ....   238.276004: dma_fence_signaled:
driver=amdgpu timeline=sdma1 context=11 seqno=211
[  238.180634] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout,
signaled seq=32624, emitted seq=32626
[  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!
[  238.180641] amdgpu 0000:0a:00.0: GPU reset begin!

crash detected!

executing umr -O halt_waves -wa
No active waves!


executing umr -O verbose -R gfx[.]

polaris11.gfx.rptr == 1792
polaris11.gfx.wptr == 1792
polaris11.gfx.drv_wptr == 1792
polaris11.gfx.ring[1761] == 0xffff1000    ... 
polaris11.gfx.ring[1762] == 0xffff1000    ... 
polaris11.gfx.ring[1763] == 0xffff1000    ... 
polaris11.gfx.ring[1764] == 0xffff1000    ... 
polaris11.gfx.ring[1765] == 0xffff1000    ... 
polaris11.gfx.ring[1766] == 0xffff1000    ... 
polaris11.gfx.ring[1767] == 0xffff1000    ... 
polaris11.gfx.ring[1768] == 0xffff1000    ... 
polaris11.gfx.ring[1769] == 0xffff1000    ... 
polaris11.gfx.ring[1770] == 0xffff1000    ... 
polaris11.gfx.ring[1771] == 0xffff1000    ... 
polaris11.gfx.ring[1772] == 0xffff1000    ... 
polaris11.gfx.ring[1773] == 0xffff1000    ... 
polaris11.gfx.ring[1774] == 0xffff1000    ... 
polaris11.gfx.ring[1775] == 0xffff1000    ... 
polaris11.gfx.ring[1776] == 0xffff1000    ... 
polaris11.gfx.ring[1777] == 0xffff1000    ... 
polaris11.gfx.ring[1778] == 0xffff1000    ... 
polaris11.gfx.ring[1779] == 0xffff1000    ... 
polaris11.gfx.ring[1780] == 0xffff1000    ... 
polaris11.gfx.ring[1781] == 0xffff1000    ... 
polaris11.gfx.ring[1782] == 0xffff1000    ... 
polaris11.gfx.ring[1783] == 0xffff1000    ... 
polaris11.gfx.ring[1784] == 0xffff1000    ... 
polaris11.gfx.ring[1785] == 0xffff1000    ... 
polaris11.gfx.ring[1786] == 0xffff1000    ... 
polaris11.gfx.ring[1787] == 0xffff1000    ... 
polaris11.gfx.ring[1788] == 0xffff1000    ... 
polaris11.gfx.ring[1789] == 0xffff1000    ... 
polaris11.gfx.ring[1790] == 0xffff1000    ... 
polaris11.gfx.ring[1791] == 0xffff1000    ... 
polaris11.gfx.ring[1792] == 0xc0032200    rwD 

trying to get ADR from dmesg output for 'umr -O verbose -vm ...'
trying to get VMID from dmesg output for 'umr -O verbose -vm ...'

done after crash, flashing NUMLOCK LED.
     amdgpu_cs:0-799   [001] ....   286.852838: amdgpu_bo_list_set:
list=0000000099c16b5c, bo=000000001771c26f, bo_size=131072
     amdgpu_cs:0-799   [001] ....   286.852846: amdgpu_bo_list_set:
list=0000000099c16b5c, bo=0000000046bfd439, bo_size=131072
...
----------------------------------------------

But sure, there were no "VM_CONTEXT1_PROTECTION_FAULT_ADDR" error messages this
time. Sometimes such are emitted, sometimes not.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180821/fd756872/attachment.html>


More information about the dri-devel mailing list