[Bug 110674] Crashes / Resets From AMDGPU / Radeon VII

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Aug 11 01:15:48 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #69 from ReddestDream <reddestdream at gmail.com> ---
>The inconsistent nature of this bug and the fact that it sometimes doesn't appear suggests a race condition. I'd assume something else on the system happens before or after amdgpu is expecting.

>Is there any way to delay loading the amdgpu driver and manually loading it after everything else?

Based on all the data you (Tom B) and others have provided as well as my own
tests, my current suspicion is that there is a bug in the display mode/type
detection and enumeration, leading to the driver losing state consistency and
eventually contact entirely with the hardware.

I think the clock dysregulation and excessive voltage/wattage are symptoms of
the underlying disease rather than the cause. If something is wrong between
what the driver thinks the hardware state is and what the hardware state
actually is, it's only a matter of time before this inconsistency leads to
dysregulation, instability, and crashing. For this reason, I'm not convinced
there is any better workaround than "just use one monitor." Pushing up the
clocks only seems to at best prolong the inevitable. :(

I'm also not convinced there is one commit in particular to point to here.
Rather it was probably in the restructuring of something between 5.0 and 5.1
that it became fundamentally broken while it was always somewhat flawed before.

Unfortunately, Radeon VII probably isn't really being tested by kernel
developers anymore and it's likely that multimonitor with this card on Linux
was never fully tested at all. It also seems like AMD's kernel development has
moved on to Navi and that the upcoming new Vega card, Arcturus, won't have
display outs at all, so work on that can't fix this issue.

As this card is fairly uncommon and expensive, the only real hope for a fix
seems to be to get the card into the hands of someone who has the skill to fix
graphics drivers and a willingness/need to test multimonitor.

Perhaps someone like gnif who has been able to solve the infamous Vega Reset
Bug on Vega 10 cards might be able to fix it. It's likely he will encounter our
issue while testing Radeon VII with Looking Glass and such. Someone has already
offered to lend him a Radeon VII as he states in the video, so there's some
hope that his work will lead to a solution.

https://www.youtube.com/watch?v=1ShkjXoG0O0

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190811/fb11f605/attachment-0001.html>


More information about the dri-devel mailing list