[Bug 111807] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout cause process into Disk sleep state

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Sep 25 02:38:57 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=111807

            Bug ID: 111807
           Summary: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
                    timeout  cause process into Disk sleep state
           Product: DRI
           Version: DRI git
          Hardware: ARM
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: not set
         Component: DRM/AMDgpu
          Assignee: dri-devel at lists.freedesktop.org
          Reporter: liansz at fzcyjh.com

Created attachment 145506
  --> https://bugs.freedesktop.org/attachment.cgi?id=145506&action=edit
timeoutlog

We ran into some gfx timeout problems.
Currently, we use the kernel of 4.19.36. We merged some patches regarding GPU
from the community. There are multiple GPUs on each server, and each GPU is
running some rendering programs. Now, there are 2 different cases of failures.
The first one is that one graphics card of a server fails, rendering program
does not have a D state, and it shows error code 110 tested by
/sys/kernel/debug/dri/1/amdgpu_test_ib, then shows pass after a second test.
See tmp-618-2.zip for details.
The second one is that one graphics card of a server fails, the whole rendering
program running on the server fails and has D state. It fails at drm_release.
See tmp-619.zip for details.
Could you please help us out?

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190925/d91e2163/attachment.html>


More information about the dri-devel mailing list