[Bug 102322] System crashes after "[drm] IP block:gmc_v8_0 is hung!" / [drm] IP block:sdma_v3_0 is hung!

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Jun 28 21:09:09 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=102322

--- Comment #19 from Andrey Grodzovsky <andrey.grodzovsky at amd.com> ---
Can you use addr2line or gdb with 'list' command to give the line number
matching (In reply to dwagner from comment #18)
> The good news: So far no crashes during normal uptime with
> amdgpu.vm_update_mode=3
> 
> The bad news: System crashes immediately upon S3 resume (with messages quite
> different from the ones I saw with earlier S3-resume crashes) - I filed bug
> report https://bugs.freedesktop.org/show_bug.cgi?id=107065 on this.
> 
> (In reply to Andrey Grodzovsky from comment #17)
> > dwagner, this is obviously just a work around and not a fix. It points to
> > some problem with SDMA packets, if you want to continue exploring we can try
> > to dump some fence traces and SDMA HW ring content to examine the latest
> > packets before the hang happened.
> 
> If you can include some debug output into "amd-staging-drm-next" that helps
> finding the root cause, I might be able to provide some output - if the
> kernel survives long enough after the crash to write the system journal -
> this has not always been the case.

No need to recompile, just need to see what is the content of SDMA ring buffer
when the hang occurs.

Clone and build our register analyzer from here -
https://cgit.freedesktop.org/amd/umr/ and once the hang happens just run 

sudo umr -lb
sudo umr -R gfx[.]
sudo umr -R sdma0[.]
sudo umr -R sdma1[.]

I will probably need more info later but let's try this first.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180628/af8a3888/attachment-0001.html>


More information about the dri-devel mailing list