[Bug 111807] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout cause process into Disk sleep state
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Wed Sep 25 02:38:57 UTC 2019
https://bugs.freedesktop.org/show_bug.cgi?id=111807
Bug ID: 111807
Summary: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx
timeout cause process into Disk sleep state
Product: DRI
Version: DRI git
Hardware: ARM
OS: Linux (All)
Status: NEW
Severity: major
Priority: not set
Component: DRM/AMDgpu
Assignee: dri-devel at lists.freedesktop.org
Reporter: liansz at fzcyjh.com
Created attachment 145506
--> https://bugs.freedesktop.org/attachment.cgi?id=145506&action=edit
timeoutlog
We ran into some gfx timeout problems.
Currently, we use the kernel of 4.19.36. We merged some patches regarding GPU
from the community. There are multiple GPUs on each server, and each GPU is
running some rendering programs. Now, there are 2 different cases of failures.
The first one is that one graphics card of a server fails, rendering program
does not have a D state, and it shows error code 110 tested by
/sys/kernel/debug/dri/1/amdgpu_test_ib, then shows pass after a second test.
See tmp-618-2.zip for details.
The second one is that one graphics card of a server fails, the whole rendering
program running on the server fails and has D state. It fails at drm_release.
See tmp-619.zip for details.
Could you please help us out?
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190925/d91e2163/attachment.html>
More information about the dri-devel
mailing list