[Bug 105251] [Vega10] GPU lockup on boot: VMC page fault

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Jul 20 17:05:43 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=105251

--- Comment #71 from deltasquared <ds2.bugs.freedesktop at gmail.com> ---
I would like to pitch into this as it seems this particular problem has been
plaguing me for some months now. Currently running kernel 5.2.1-arch1-1-ARCH
and I will still occasionally get errors like this when running minetest (they
seem to be subtly different from the others in this thread upon reading):

[ 5699.136659] amdgpu 0000:0b:00.0: [gfxhub] no-retry page fault (src_id:0
ring:155 vmid:5 pasid:32770, for process minetest pid 7127 thread minetest:cs0
pid 7133)
[ 5699.136662] amdgpu 0000:0b:00.0:   in page starting at address
0x000080014034d000 from 27
[ 5699.136664] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00501136
[ 5704.343299] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[ 5709.259775] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=443165, emitted seq=443167
[ 5709.259860] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process minetest pid 7127 thread minetest:cs0 pid 7133
[ 5709.259862] [drm] GPU recovery disabled.
*repeat last four lines endlessly...*

Relevant hardware is a ryzen 2200G (vega 8 GPU). The issue has survived
swapping almost every component in my system so I think it is safe to rule out
hardware brokenness in my case at least. Mercifully it seems the rest of the
system survives this hence being able to capture the dmesg output, but with the
gpu hard locked obviously the only recourse is to then reboot (after gathering
some output for a while).

I haven't yet been able to obtain an API trace from minetest when it becomes
difficult. Furthermore it doesn't do so reliably - I can often play for hours,
but then the crash will strike and then the issue can sometimes persist across
a few reboots if I press minetest to try and load a world again fast enough.
Heck idk, is it a case of the precise 3D cloud pattern in the menu background
at the time? Sounds like it would be useful for me to have apitrace running in
the background whenever I run it on the off chance I can catch it in the act.

zzyxpaw's "vega crasher" in message #52 has reliably been able to cause GPU
lock-up. Same sort of story: black window will pop up, nothing happens, and
either lock-up occurs after a moment, or (interestingly) attempting to move the
window in X11 will cause the lock-up immediately.

If there is any more data (such as attempting to get an apitrace) that would be
useful I am willing to attempt to gather it, as this issue is the only blemish
on an otherwise perfectly stable system.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190720/e841a1b5/attachment.html>


More information about the dri-devel mailing list