<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body><span class="vcard"><a class="email" href="mailto:michel@daenzer.net" title="Michel Dänzer <michel@daenzer.net>"> <span class="fn">Michel Dänzer</span></a>
</span> changed
<a class="bz_bug_link
bz_status_NEW "
title="NEW - [Intel GFX CI] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137"
href="https://bugs.freedesktop.org/show_bug.cgi?id=107762">bug 107762</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">CC</td>
<td>
</td>
<td>ckoenig.leichtzumerken@gmail.com, dev@lynxeye.de
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - [Intel GFX CI] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137"
href="https://bugs.freedesktop.org/show_bug.cgi?id=107762#c2">Comment # 2</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - [Intel GFX CI] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137"
href="https://bugs.freedesktop.org/show_bug.cgi?id=107762">bug 107762</a>
from <span class="vcard"><a class="email" href="mailto:michel@daenzer.net" title="Michel Dänzer <michel@daenzer.net>"> <span class="fn">Michel Dänzer</span></a>
</span></b>
<pre>(In reply to Martin Peres from <a href="show_bug.cgi?id=107762#c0">comment #0</a>)
<span class="quote">> [ 358.292609] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137
> [ 358.292635] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=145, emitted seq=145</span >
(In reply to Martin Peres from <a href="show_bug.cgi?id=107762#c1">comment #1</a>)
<span class="quote">> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=147, emitted seq=147</span >
Hmm, signalled and emitted sequence numbers are always the same, meaning the
hardware hasn't actually timed out?
I can think of two possibilities:
* A GPU scheduler bug causing the job timeout handling to be triggered
spuriously. (Could something be stalling the system work queue, so the items
scheduled by drm_sched_job_finish_cb can't call drm_sched_job_finish in time?)
* A problem with the handling of the GPU's interrupts. Do the numbers on the
amdgpu line in /proc/interrupts still increase after these messages appeared,
or at least in the ten seconds before they appear?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>