[Bug 112174] AMD Radeon 5700 / Navi: amdgpu.gpu_recovery not working

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Oct 30 07:57:41 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=112174

            Bug ID: 112174
           Summary: AMD Radeon 5700 / Navi: amdgpu.gpu_recovery not
                    working
           Product: DRI
           Version: DRI git
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: major
          Priority: not set
         Component: DRM/AMDgpu
          Assignee: dri-devel at lists.freedesktop.org
          Reporter: temp201602 at kaffeeschluerfer.com

I have set "amdgpu.gpu_recovery=1" in my kernel boot params. When my GPU is
crashing, recovery does not work.

Syslog:
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed
out!
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled
seq=1935, emitted seq=1937
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg
pid 1861 thread Xorg:cs0 pid 1864
 amdgpu 0000:45:00.0: GPU reset begin!
[drm] ring test on 10 succeeded in 22 usecs
[drm] ring test on 10 succeeded in 29 usecs
amdgpu 0000:45:00.0: GPU reset succeeded, trying to resume
[drm] PCIE GART of 512M enabled (table at 0x00000080001E8000).
[drm] PSP is resuming...
[drm] reserve 0x7200000 from 0x81f7c00000 for PSP TMR
amdgpu: [powerplay] SMU is resuming...
amdgpu: [powerplay] SMU is resumed successfully!
[drm] kiq ring mec 2 pipe 1 q 0
[drm] ring test on 10 succeeded in 33 usecs
[drm] ring test on 10 succeeded in 8 usecs
[drm] gfx 0 ring me 0 pipe 0 q 0
[drm:gfx_v10_0_ring_test_ring [amdgpu]] *ERROR* amdgpu: ring 0 test failed
(scratch(0xC040)=0xCAFEDEAD)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
<gfx_v10_0> failed -22
amdgpu 0000:45:00.0: GPU reset(1) failed
amdgpu 0000:45:00.0: GPU reset end with ret = -22
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled
seq=1937, emitted seq=1937
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg
pid 1861 thread Xorg:cs0 pid 1864
amdgpu 0000:45:00.0: GPU reset begin!


GPU recovery is really important, especially at the moment with the current
state of navi stability issues.
Please fix and enable recovery as default.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191030/05b9f03e/attachment-0001.html>


More information about the dri-devel mailing list