[Bug 111481] AMD Navi GPU frequent freezes on both Manjaro/Ubuntu with kernel 5.3 and mesa 19.2 -git/llvm9

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Mon Sep 2 06:05:20 UTC 2019


--- Comment #13 from Mathieu Belanger <b747xx at gmail.com> ---
(In reply to Marko Popovic from comment #10)
> (In reply to Matthias Müller from comment #9)
> > On my side i can report that the issue does not occur if i don't use a tool
> > to modify the FANs - does anyone of you use something of the like or are
> > this seperate issues?
> I don't use any tools, all is stock.
> (In reply to Mathieu Belanger from comment #7)
> > Created attachment 145225 [details] [review] [review]
> > Merge last adg5f code
> > 
> > Ok, I did look at the recent kernel patch and commit and they seam to have
> > fixed a couple bugs. I do not know it it include these but I did not crash
> > one time since I merged that into the kernel 5.3-rc6. (that code is staged
> > for 5.4 merge window).
> > 
> > I did attach the patch so you can merge that if you wish to try. It add all
> > the latest bits for AMDGPU into 5.3-rc6, including Renoir support.
> After applying the patch, same type of error occurs, luckily very easy to
> reproduce with Citra emulator, apparently it does something that AMD's
> driver really doesn't like and makes chances higher for error to occur. Also
> when CPU is under heavy I/O load error seems more likely to occur as well on
> my end.
> Last log after applying the latest patch from the merge posted in the
> attachment:
> sep 01 02:29:10 Marko-PC kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]]
> *ERROR* Waiting for fences timed out!
> sep 01 02:29:10 Marko-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
> ring gfx_0.0.0 timeout, signaled seq=16312, emitted seq=16314
> sep 01 02:29:10 Marko-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
> Process information: process citra-qt pid 2928 thread citra-qt:cs0 pid 2938
> sep 01 02:29:10 Marko-PC kernel: [drm] GPU recovery disabled.
> If we could get any official AMD responses to at least make sure that we're
> at least being listened to would be very nice.

I was able to reproduce that Citra crash.
Followed the instruction, it did crash instantly after choosing continue (or a
fraction of a second after, the music lagged a lil and complete system crash
(was able so sync/umount/reboot with the magics key)).

Is your crash exactly at the same place? If so then it's very reproducible and 
it might be a good idea to run a opengl trace to see what commands was sent
last to provoke the crash.

I am not familiar with the Ubuntu stuff, is these got compiled on your system?
if no do you know the build date of your Mesa, libdrm and xf86-video-amdgpu
(x11 ddx).

Also can you tell what microcode files dates you do have?

Libdrm : 07:49:10 PM 08/27/2019
Mesa : 05:37:07 PM 08/30/2019
Xorg amdgpu DDX : 07:55:17 PM 08/27/2019

The microcode files where not available on my distribution when I installed
them. I did download/install them on August 6 but they where from July 15 ish I
think, I remember that the latest microcode at that time where crashing with a
black screen on module load and that's why I did install an older version.

You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190902/89ebbe04/attachment.html>

More information about the dri-devel mailing list