<html>
    <head>
      <base href="https://bugs.freedesktop.org/">
    </head>
    <body><span class="vcard"><a class="email" href="mailto:michel@daenzer.net" title="Michel Dänzer <michel@daenzer.net>"> <span class="fn">Michel Dänzer</span></a>
</span> changed
          <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [Intel GFX CI] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=107762">bug 107762</a>
          <br>
             <table border="1" cellspacing="0" cellpadding="8">
          <tr>
            <th>What</th>
            <th>Removed</th>
            <th>Added</th>
          </tr>

         <tr>
           <td style="text-align:right;">CC</td>
           <td>
                
           </td>
           <td>ckoenig.leichtzumerken@gmail.com, dev@lynxeye.de
           </td>
         </tr></table>
      <p>
        <div>
            <b><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [Intel GFX CI] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=107762#c2">Comment # 2</a>
              on <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [Intel GFX CI] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=107762">bug 107762</a>
              from <span class="vcard"><a class="email" href="mailto:michel@daenzer.net" title="Michel Dänzer <michel@daenzer.net>"> <span class="fn">Michel Dänzer</span></a>
</span></b>
        <pre>(In reply to Martin Peres from <a href="show_bug.cgi?id=107762#c0">comment #0</a>)
<span class="quote">> [  358.292609] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137
> [  358.292635] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=145, emitted seq=145</span >

(In reply to Martin Peres from <a href="show_bug.cgi?id=107762#c1">comment #1</a>)
<span class="quote">> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=137, emitted seq=137
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=147, emitted seq=147</span >

Hmm, signalled and emitted sequence numbers are always the same, meaning the
hardware hasn't actually timed out?

I can think of two possibilities:

* A GPU scheduler bug causing the job timeout handling to be triggered
spuriously. (Could something be stalling the system work queue, so the items
scheduled by drm_sched_job_finish_cb can't call drm_sched_job_finish in time?)

* A problem with the handling of the GPU's interrupts. Do the numbers on the
amdgpu line in /proc/interrupts still increase after these messages appeared,
or at least in the ten seconds before they appear?</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>