[Bug 206475] amdgpu under load drop signal to monitor until hard reset

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Sat May 23 16:44:13 UTC 2020


https://bugzilla.kernel.org/show_bug.cgi?id=206475

--- Comment #12 from Andrew Ammerlaan (andrewammerlaan at riseup.net) ---
Created attachment 289247
  --> https://bugzilla.kernel.org/attachment.cgi?id=289247&action=edit
messages (reset succesful this time)

And again, twice on the same day :(

But this time:
amdgpu 0000:01:00.0: GPU reset begin!
amdgpu 0000:01:00.0: GPU BACO reset
amdgpu 0000:01:00.0: GPU reset succeeded, trying to resume

This time the reset succeeded, however after restarting X, I got stuck on the
KDE login splash screen. The log (attached) shows some segfaults.

It seems to me that there are two issues here.

1) The GPU is (often) not successfully recovered after a reset, and if it is
recovered successfully segfaults follow in radeonsi_dri.so

2) It goes into a reset in the first place, for no apparent reason

I guess this bug report is mostly about the second issue, why does it go into a
reset? How do I debug this?

It would be great if we could get this fixed, as it is getting kinda annoying.
(This is a brand new GPU, it is not overheating, what is wrong? )

-- 
You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list