[Mesa-dev] [Bug 109955] amdgpu [RX Vega 64] system freeze while gaming

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Mar 11 07:05:19 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=109955

            Bug ID: 109955
           Summary: amdgpu [RX Vega 64] system freeze while gaming
           Product: Mesa
           Version: 18.3
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: Mesa core
          Assignee: mesa-dev at lists.freedesktop.org
          Reporter: ilvipero at gmx.com
        QA Contact: mesa-dev at lists.freedesktop.org

Symptoms:
During gaming sessions, system locks up and freezes completely. Audio seems to
keep working for a few seconds more, but full desktop is frozen, no mouse and
keyboard actions available. Hard reset only possible action on local pc. I have
not tried to ssh in the PC from another box.
Some times I can play for 20 minutes, some times for a few hours. Freezes seem
unrelated to any activity running in-game. All system temperatures are under
control.
The system outside of 3d gaming is very stable, including playing videos,
encoding videos, regular desktop usage.

Further testing done:
1. Installed Windows10 on same hardware, same BIOS settings. Running same games
has no issue at all. No hangs, no problems.
2. Ran same games on my NVIDIA+Intel based laptop. No issue at all on same
distributions and kernels. No hangs, no problems.

Additional information:
This issue has been going on for a while now. It comes and goes with Mesa
versions (or Mesa+kernel combinations). Some times an update comes and I have
no freezes for weeks. Then next update gets installed and the issue comes back. 
I have tested this mainly on openSUSE Tumbleweed, Ubuntu 18.04 and Ubuntu
18.10. 

-- Ubuntu testing:
Ubuntu 18.04 was running well for months, then latest mesa updates that got in
2 weeks ago, re-introduced the issue. System started freezing again. I tried
updating to 18.10 but I had the same issue. I enabled oibaf PPA for video
drivers and the issue disappeared. Then after a few days a new mesa came in and
the issue came back. I am now running on Padoka unstable PPA with Mesa 19 and
LLVM9. The issue still happens.

-- Tumbleweed testing:
I am adding my previous bug report I filed with Tumbleweed. A couple of
occurrences with system logs. I will post more as I collect them.

OS: OpenSUSE tumbleweed x86_64 updated (2018 04 21)
Kernel: 4.16.2-1-default
Desktop Environment: KDE Plasma (x11)
OpenGL version string: 3.0 Mesa 18.0.0
GPU: AMD Radeon RX Vega 64 8GB

System Logs:

Apr 21 17:08:34 STUDIO kernel: [drm:gfx_v9_0_priv_reg_irq [amdgpu]] *ERROR*
Illegal register access in command stream
Apr 21 17:08:34 STUDIO kernel: [drm] No hardware hang detected. Did some blocks
stall?
Apr 21 17:08:44 STUDIO kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx timeout, last signaled seq=128859, last emitted seq=128861
Apr 21 17:08:44 STUDIO kernel: [drm] No hardware hang detected. Did some blocks
stall?
-- Reboot --


Dmesg lines relative to amdgpu:

[    3.407020] [drm] amdgpu kernel modesetting enabled.
[    3.411462] fb: switching to amdgpudrmfb from VESA VGA
[    3.426163] amdgpu 0000:04:00.0: Invalid PCI ROM header signature: expecting
0xaa55, got 0xffff
[    3.426261] amdgpu 0000:04:00.0: VRAM: 8176M 0x000000F400000000 -
0x000000F5FEFFFFFF (8176M used)
[    3.426263] amdgpu 0000:04:00.0: GTT: 256M 0x000000F600000000 -
0x000000F60FFFFFFF
[    3.426371] [drm] amdgpu: 8176M of VRAM memory ready
[    3.426372] [drm] amdgpu: 8176M of GTT memory ready.
[    4.031665] fbcon: amdgpudrmfb (fb0) is primary device
[    4.083803] amdgpu 0000:04:00.0: fb0: amdgpudrmfb frame buffer device
[    4.096086] amdgpu 0000:04:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
[    4.096088] amdgpu 0000:04:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub
0
[    4.096089] amdgpu 0000:04:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub
0
[    4.096090] amdgpu 0000:04:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub
0
[    4.096091] amdgpu 0000:04:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub
0
[    4.096093] amdgpu 0000:04:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub
0
[    4.096094] amdgpu 0000:04:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on
hub 0
[    4.096095] amdgpu 0000:04:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on
hub 0
[    4.096096] amdgpu 0000:04:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on
hub 0
[    4.096098] amdgpu 0000:04:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub
0
[    4.096099] amdgpu 0000:04:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
[    4.096100] amdgpu 0000:04:00.0: ring 11(sdma1) uses VM inv eng 5 on hub 1
[    4.096101] amdgpu 0000:04:00.0: ring 12(uvd) uses VM inv eng 6 on hub 1
[    4.096103] amdgpu 0000:04:00.0: ring 13(uvd_enc0) uses VM inv eng 7 on hub
1
[    4.096104] amdgpu 0000:04:00.0: ring 14(uvd_enc1) uses VM inv eng 8 on hub
1
[    4.096105] amdgpu 0000:04:00.0: ring 15(vce0) uses VM inv eng 9 on hub 1
[    4.096107] amdgpu 0000:04:00.0: ring 16(vce1) uses VM inv eng 10 on hub 1
[    4.096108] amdgpu 0000:04:00.0: ring 17(vce2) uses VM inv eng 11 on hub 1
[    4.096662] [drm] Initialized amdgpu 3.23.0 20150101 for 0000:04:00.0 on
minor 0



The issue was later identified here  
https://bugs.freedesktop.org/show_bug.cgi?id=105317 and fixed with Mesa 18.0.1. 



Then, The issue was noticed again after a few months:
OS: OpenSUSE tumbleweed x86_64 updated (2018 08 10)
Kernel: 4.17.2-1-default
Desktop Environment: KDE Plasma (x11)
OpenGL version string: 3.1 Mesa 18.1.5
GPU: AMD Radeon RX Vega 64 8GB


Relevant log lines I found during freeze:

2018-08-09T23:16:53.103775+08:00 MGDT-Tumbleweed kernel: [ 6305.852703]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled
seq=1745163, last emitted seq=
1745165
2018-08-09T23:16:53.103795+08:00 MGDT-Tumbleweed kernel: [ 6305.852704] [drm]
No hardware hang detected. Did some blocks stall?


Dmesg lines relative to amdgpu:

[    3.130759] [drm] amdgpu kernel modesetting enabled.
[    3.135770] fb: switching to amdgpudrmfb from EFI VGA
[    3.136106] amdgpu 0000:03:00.0: Invalid PCI ROM header signature: expecting
0xaa55, got 0xffff
[    3.136171] amdgpu 0000:03:00.0: VRAM: 8176M 0x000000F400000000 -
0x000000F5FEFFFFFF (8176M used)
[    3.136173] amdgpu 0000:03:00.0: GTT: 512M 0x000000F600000000 -
0x000000F61FFFFFFF
[    3.136494] [drm] amdgpu: 8176M of VRAM memory ready
[    3.136495] [drm] amdgpu: 8176M of GTT memory ready.
[    4.114469] fbcon: amdgpudrmfb (fb0) is primary device
[    4.141179] amdgpu 0000:03:00.0: fb0: amdgpudrmfb frame buffer device
[    4.164072] amdgpu 0000:03:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
[    4.164074] amdgpu 0000:03:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub
0
[    4.164075] amdgpu 0000:03:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub
0
[    4.164075] amdgpu 0000:03:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub
0
[    4.164076] amdgpu 0000:03:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub
0
[    4.164077] amdgpu 0000:03:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub
0
[    4.164078] amdgpu 0000:03:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on
hub 0
[    4.164079] amdgpu 0000:03:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on
hub 0
[    4.164079] amdgpu 0000:03:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on
hub 0
[    4.164080] amdgpu 0000:03:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub
0
[    4.164081] amdgpu 0000:03:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
[    4.164082] amdgpu 0000:03:00.0: ring 11(sdma1) uses VM inv eng 5 on hub 1
[    4.164083] amdgpu 0000:03:00.0: ring 12(uvd) uses VM inv eng 6 on hub 1
[    4.164084] amdgpu 0000:03:00.0: ring 13(uvd_enc0) uses VM inv eng 7 on hub
1
[    4.164085] amdgpu 0000:03:00.0: ring 14(uvd_enc1) uses VM inv eng 8 on hub
1
[    4.164085] amdgpu 0000:03:00.0: ring 15(vce0) uses VM inv eng 9 on hub 1
[    4.164086] amdgpu 0000:03:00.0: ring 16(vce1) uses VM inv eng 10 on hub 1
[    4.164087] amdgpu 0000:03:00.0: ring 17(vce2) uses VM inv eng 11 on hub 1
[    4.164553] [drm] Initialized amdgpu 3.25.0 20150101 for 0000:03:00.0 on
minor 0

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20190311/9d72514b/attachment-0001.html>


More information about the mesa-dev mailing list