[Bug 107762] [Intel GFX CI] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Thu Sep 6 15:16:07 UTC 2018
https://bugs.freedesktop.org/show_bug.cgi?id=107762
Michel Dänzer <michel at daenzer.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |ckoenig.leichtzumerken at gmai
| |l.com, dev at lynxeye.de
--- Comment #2 from Michel Dänzer <michel at daenzer.net> ---
(In reply to Martin Peres from comment #0)
> [ 358.292609] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137
> [ 358.292635] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=145, emitted seq=145
(In reply to Martin Peres from comment #1)
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=147, emitted seq=147
Hmm, signalled and emitted sequence numbers are always the same, meaning the
hardware hasn't actually timed out?
I can think of two possibilities:
* A GPU scheduler bug causing the job timeout handling to be triggered
spuriously. (Could something be stalling the system work queue, so the items
scheduled by drm_sched_job_finish_cb can't call drm_sched_job_finish in time?)
* A problem with the handling of the GPU's interrupts. Do the numbers on the
amdgpu line in /proc/interrupts still increase after these messages appeared,
or at least in the ten seconds before they appear?
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20180906/2539fa93/attachment.html>
More information about the dri-devel
mailing list