[Bug 111481] AMD Navi GPU frequent freezes on both Manjaro/Ubuntu with kernel 5.3 and mesa 19.2 -git/llvm9

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Nov 9 17:57:57 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=111481

--- Comment #223 from lptech1024 at gmail.com ---
Followup to #216:

Fedora 31: Kernel 5.3.9, GNOME 3.34, Mesa 19.2.2, linux-firmware 20190923, LLVM
9.0.0

The hang is 100% reproducible.

It occurs running the Linux-native (Vulkan) version of Shadow of the Tomb
Raider (SotTR). I have never run SotTR under Proton/Wine, so that isn't a
confounding variable.

The (unskippable) cutscene is for the Amazon River in Peru and occurs anywhere
between 15 seconds before the pilot is struck and the pilot is struck. Even
when the video hangs, you can usually hear fragments (sound effects) of the
game for a few seconds afterwords.

I ran SotTR with vktrace and activated the Gnome (Wayland) overview to see if
there I could catch any relevant terminal output (none that I saw). The game
still had focus, so it continued playing. After the hang (when I rebooted),
there wasn't a vktrace file. I would assume this would be either it didn't
write it out due to the hang or it didn't have content to write.

However, with it running visible in the overview (and a manual kernel update),
I got both ring gfx and sdma errors:

Nov 07 [SNIP]:24 [SNIP] kernel: [drm] GPU recovery disabled.
Nov 07 [SNIP]:24 [SNIP] kernel: [drm] GPU recovery disabled.
Nov 07 [SNIP]:24 [SNIP] kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process  pid 0 thread  pid 0
Nov 07 [SNIP]:24 [SNIP] kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process gnome-shell pid 1722 thread gnome-shel:cs0 pid
1768
Nov 07 [SNIP]:24 [SNIP] kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma1 timeout, signaled seq=1049, emitted seq=1053
Nov 07 [SNIP]:24 [SNIP] kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
sdma0 timeout, signaled seq=30017, emitted seq=30020
Nov 07 [SNIP]:19 [SNIP] kernel: [drm] GPU recovery disabled.
Nov 07 [SNIP]:19 [SNIP] kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process ShadowOfTheTomb pid 3890 thread WebViewRenderer
pid 4981
Nov 07 [SNIP]:19 [SNIP] kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx_0.0.0 timeout, signaled seq=75610, emitted seq=75612
Nov 07 [SNIP]:19 [SNIP] kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]]
*ERROR* Waiting for fences timed out or interrupted!

As a workaround to proceed in the game, I downloaded the AMDVLD 2019.Q4.2 .deb,
extracted the contents, modified the JSON file (to point to the local
amdvlk64.so), and ran SotTR with the VK_ICD_FILENAMES variable set to the
AMDVLK JSON file.

The AMDVLK graphics were terrible (significant percentage of random pixels
turning random colors, bad rendering of elements, etc), but I did not
experience any hangs during the cutscene. After reaching a known save point, I
switched back to mesa/RADV-llvm and haven't experienced a hang since (haven't
progressed that much further yet, but that's the only hang so far - about 13%
of the game has been completed).

This would seem to point to a bug at least partially due to mesa/RADV-llvm.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191109/84640e9a/attachment.html>


More information about the dri-devel mailing list