[Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working.

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Nov 22 18:47:58 UTC 2018


https://bugs.freedesktop.org/show_bug.cgi?id=105733

--- Comment #51 from Allan <allan4229 at gmail.com> ---
Tried to install the RX480 on the other PC : the card is too big that it
touches the RAM slot's tabs. Can't install it.

In time, seems like the errors delay a little bit when setting
randomize_va_space=0. Was testing it for the CPU and noticed that amdgpu
delayed to fail, but it still failed.

What happened :
- the screen got granulated with pinkish colors as usual
 - desktop extended this behavior
- but I could operate the system
- tty was black and white (normal)
- I could restart x server
- colors got normal after restarting
- tried the same application again
- crashed and froze the system

Main difference : 
- now sometimes I can kill the tasks/restart xserver

I registered the times of each event, here follows:

(Firefox was opened in background while I tried to play Left for Dead 2 through
steam)

1. Recoverable delay with granulated colors (l4d2 begins 11:48, occurs 11:50
after some delay while loading the game menu)
> [Thu Nov 22 11:48:03 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=11477, emitted seq=11480
> [Thu Nov 22 11:48:03 2018] amdgpu 0000:09:00.0: GPU reset begin!
> [Thu Nov 22 11:48:03 2018] amdgpu 0000:09:00.0: GPU pci config reset
> [Thu Nov 22 11:48:03 2018] amdgpu 0000:09:00.0: GPU reset succeeded, trying to resume
> [Thu Nov 22 11:48:03 2018] [drm] PCIE GART of 256M enabled (table at 0x000000F400300000).
> [Thu Nov 22 11:48:03 2018] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
> [Thu Nov 22 11:48:04 2018] amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.3.1 test failed (-110)
> [Thu Nov 22 11:48:04 2018] [drm] UVD and UVD ENC initialized successfully.
> [Thu Nov 22 11:48:04 2018] [drm] VCE initialized successfully.
> [Thu Nov 22 11:48:04 2018] [drm] recover vram bo from shadow start
> [Thu Nov 22 11:48:04 2018] [drm] recover vram bo from shadow done
> [Thu Nov 22 11:48:04 2018] [drm] Skip scheduling IBs!
> [Thu Nov 22 11:48:04 2018] [drm] Skip scheduling IBs!
> [Thu Nov 22 11:48:04 2018] amdgpu 0000:09:00.0: GPU reset(1) succeeded!
> [Thu Nov 22 11:48:04 2018] [drm] Skip scheduling IBs!
> [Thu Nov 22 11:48:04 2018] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> [Thu Nov 22 11:48:04 2018] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> [Thu Nov 22 11:48:04 2018] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> [Thu Nov 22 11:48:04 2018] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> [Thu Nov 22 11:48:06 2018] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
> [Thu Nov 22 11:50:46 2018] show_signal_msg: 9 callbacks suppressed
> [Thu Nov 22 11:50:46 2018] Chrome_~dThread[1734]: segfault at 0 ip 00007f7926c4c181 sp 00007f792493aad0 error 6 in libxul.so[7f7926c38000+3a2c000]
> [Thu Nov 22 11:50:46 2018] Code: 15 dc f2 5f 04 48 89 10 c7 04 25 00 00 00 00 7c 09 00 00 e8 21 60 ff ff 90 48 8b 05 f9 2a 9b 05 48 8d 0d 22 f3 5f 04 48 89 08 <c7> 04 25 00 00 00 00 02 0a 00 00 e8 ff 5f ff ff e8 0a f3 ff ff 48
> [Thu Nov 22 11:50:46 2018] Chrome_~dThread[1885]: segfault at 0 ip 00007f7fa150a181 sp 00007f7f9f1f8ad0 error 6 in libxul.so[7f7fa14f6000+3a2c000]
> [Thu Nov 22 11:50:46 2018] Chrome_~dThread[8072]: segfault at 0 ip 00007fffededa181 sp 00007fffebbc8ad0 error 6
> [Thu Nov 22 11:50:46 2018] Code: 15 dc f2 5f 04 48 89 10 c7 04 25 00 00 00 00 7c 09 00 00 e8 21 60 ff ff 90 48 8b 05 f9 2a 9b 05 48 8d 0d 22 f3 5f 04 48 89 08 <c7> 04 25 00 00 00 00 02 0a 00 00 e8 ff 5f ff ff e8 0a f3 ff ff 48
> [Thu Nov 22 11:50:46 2018]  in libxul.so[7fffedec6000+3a2c000]
> [Thu Nov 22 11:50:46 2018] Code: 15 dc f2 5f 04 48 89 10 c7 04 25 00 00 00 00 7c 09 00 00 e8 21 60 ff ff 90 48 8b 05 f9 2a 9b 05 48 8d 0d 22 f3 5f 04 48 89 08 <c7> 04 25 00 00 00 00 02 0a 00 00 e8 ff 5f ff ff e8 0a f3 ff ff 48
> [Thu Nov 22 11:50:46 2018] Chrome_~dThread[1931]: segfault at 0 ip 00007f8dc581f181 sp 00007f8dc350dad0 error 6 in libxul.so[7f8dc580b000+3a2c000]
> [Thu Nov 22 11:50:46 2018] Code: 15 dc f2 5f 04 48 89 10 c7 04 25 00 00 00 00 7c 09 00 00 e8 21 60 ff ff 90 48 8b 05 f9 2a 9b 05 48 8d 0d 22 f3 5f 04 48 89 08 <c7> 04 25 00 00 00 00 02 0a 00 00 e8 ff 5f ff ff e8 0a f3 ff ff 48
kern.log = dmesg

2. Unrecoverable crash (l4d2 begins 12:00, goes well until 12:55 when crashes
everything)
dmesg:
> [Thu Nov 22 12:55:04 2018] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1688198, emitted seq=1688200
> [Thu Nov 22 12:55:04 2018] amdgpu 0000:09:00.0: GPU reset begin!
> [Thu Nov 22 12:55:14 2018] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:46:crtc-0] hw_done or flip_done timed out
kern.log = dmesg

Xorg log is not reporting anything useful.


(In reply to russianneuromancer from comment #50)
> Can't tell you about RX480, but I know for sure that at least Vega 64 is
> totally fine with 1800X PCI-controller, no single not-solvable
> graphics-related issue for a year (so far all issues I had was solved by
> upgrading kernel and/or Mesa).

I wish I had this luck.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20181122/0462d4b5/attachment.html>


More information about the dri-devel mailing list