<html>
    <head>
      <base href="https://bugs.freedesktop.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - AMD Radeon 5700 / Navi: amdgpu.gpu_recovery not working"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=112174">112174</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>AMD Radeon 5700 / Navi: amdgpu.gpu_recovery not working
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>DRI
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>DRI git
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>x86-64 (AMD64)
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux (All)
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>major
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>not set
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>DRM/AMDgpu
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>dri-devel@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>temp201602@kaffeeschluerfer.com
          </td>
        </tr></table>
      <p>
        <div>
        <pre>I have set "amdgpu.gpu_recovery=1" in my kernel boot params. When my GPU is
crashing, recovery does not work.

Syslog:
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed
out!
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled
seq=1935, emitted seq=1937
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg
pid 1861 thread Xorg:cs0 pid 1864
 amdgpu 0000:45:00.0: GPU reset begin!
[drm] ring test on 10 succeeded in 22 usecs
[drm] ring test on 10 succeeded in 29 usecs
amdgpu 0000:45:00.0: GPU reset succeeded, trying to resume
[drm] PCIE GART of 512M enabled (table at 0x00000080001E8000).
[drm] PSP is resuming...
[drm] reserve 0x7200000 from 0x81f7c00000 for PSP TMR
amdgpu: [powerplay] SMU is resuming...
amdgpu: [powerplay] SMU is resumed successfully!
[drm] kiq ring mec 2 pipe 1 q 0
[drm] ring test on 10 succeeded in 33 usecs
[drm] ring test on 10 succeeded in 8 usecs
[drm] gfx 0 ring me 0 pipe 0 q 0
[drm:gfx_v10_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed
(scratch(0xC040)=0xCAFEDEAD)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
<gfx_v10_0> failed -22
amdgpu 0000:45:00.0: GPU reset(1) failed
amdgpu 0000:45:00.0: GPU reset end with ret = -22
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled
seq=1937, emitted seq=1937
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg
pid 1861 thread Xorg:cs0 pid 1864
amdgpu 0000:45:00.0: GPU reset begin!


GPU recovery is really important, especially at the moment with the current
state of navi stability issues.
Please fix and enable recovery as default.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>