[Bug 112242] amdgpu [RX Vega 56]: ring sdma0 timeout

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Nov 11 09:33:46 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=112242

            Bug ID: 112242
           Summary: amdgpu [RX Vega 56]: ring sdma0 timeout
           Product: DRI
           Version: unspecified
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: not set
         Component: DRM/AMDgpu
          Assignee: dri-devel at lists.freedesktop.org
          Reporter: mh at familie-heinz.name

Hi,

I've reported this over at bugzilla.kernel.org but didn't get any help there.
Maybe because nobody is expecting bugreports about the amdgpu driver over on
the kernels bugtracker?

So this started a while ago, when I updated from 5.0.0 to a newer kernel. I'm
currently at 5.3.0 and for almost any game I play I run into this problem:

Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=368056, emitted seq=368057
Aug 24 11:13:33 egalite kernel: [drm:drm_atomic_helper_wait_for_flip_done
[drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process 7DaysToDie.x86_ pid 8108 thread 7DaysToDie:cs0
Aug 24 11:13:33 egalite kernel: amdgpu 0000:0c:00.0: GPU reset begin!
Aug 24 11:13:33 egalite kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, but soft recovered

Only a hard reset made me recover from that.

I did some kernel traces which I will copy over to this report, if necessary,
but for now you can download them here:
https://bugzilla.kernel.org/show_bug.cgi?id=204683

It also looks a bit like this bug:
https://bugzilla.kernel.org/show_bug.cgi?id=201957 , because I also get the
"ring gfx timeout". And there are lots and lots of people having this issue.

I tried bisecting it, but failed, because either I missed the commit that
causes this, because there are multiple reasons why this happens or this really
goes way back to the time, where 4.18 was the base for drm-next (which doesn't
compile on modern compilers anymore. Also steam doesn't want to run on those
old kernels, so even when I was able to compile an older kernel, there was no
way to test them)

I even tried debugging it over ethernet (KGDBoE is a nice thing if you need
performance), but somehow this slowed everything down enough to not trigger the
bug.

I also tried the suggestions from
https://bugs.freedesktop.org/show_bug.cgi?id=109955, but forbidding the lowest
clock mode doesn't help either. (It fixes my RocketLeague problems, though).

Please advise what I should try next.

Best regards
Matthias

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191111/2c31090e/attachment.html>


More information about the dri-devel mailing list