<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - Crashes / Resets From AMDGPU / Radeon VII"
href="https://bugs.freedesktop.org/show_bug.cgi?id=110674#c75">Comment # 75</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - Crashes / Resets From AMDGPU / Radeon VII"
href="https://bugs.freedesktop.org/show_bug.cgi?id=110674">bug 110674</a>
from <span class="vcard"><a class="email" href="mailto:reddestdream@gmail.com" title="ReddestDream <reddestdream@gmail.com>"> <span class="fn">ReddestDream</span></a>
</span></b>
<pre><span class="quote">>Here's some additional investigation.</span >
<span class="quote">>[SetUclkToHightestDpmLevel] Set hard min uclk failed! Appears as one of the first errors in dmesg. This is from vega20_hwmgr.c:3354 and triggered by:</span >
I agree that [SetUclkToHightestDpmLevel] is probably the key to all this as it
always seems to be the first thing that fails after dysregulation occurs. The
"Failed to send message 0x28, response 0x0" errors show that the driver is
sending wrong or at least wrongly timed commands to the GPU that eventually
cascade into complete failure.
<span class="quote">>Again, it didn't help. I will note that this code is identical in 5.0.13 </span >
I have also been unable to find changed code since 5.0 that could be directly
connected to display detect/init/enumeration issues on Radeon VII/Vega 20. This
is why I've come to suspect the error is triggered indirectly in a way that
will probably not be obvious and by code that was likely flawed from the
beginning of Radeon VII/Vega 20 support.
This is also why I was hopeful that 5.3-rc2 would fix this issue since it has
commits that do seem to affect display detection on AMD GPUs. Alas, it did not.
:(
<span class="quote">>If the GPU did not crash with dpm disabled as a whole, the proper way to</span >
proceed would be to start from there and step by step add dpm features and see
when it starts crashing. It's not a small task since dpm code paths may be
scattered all over the code.
Unfortunately, it does look like going through and slowing disabling features
and/or bisecting might be the only way to find how this issue got started. At
least if we could narrow it down, we might be in better shape. :/
I must admit I don't have much experience with graphics drivers and when I tell
other people about this issue, they immediately want to blame X or Mesa until I
explain that I can get these errors w/o starting any graphics at all. lol.
In any case, I really appreciate your testing Tom B. And any advice you might
have on debugging, Sylvain BERTRAND, is greatly appreciated. :)</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>