<html>
<head>
<base href="https://bugs.freedesktop.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - Crashes / Resets From AMDGPU / Radeon VII"
href="https://bugs.freedesktop.org/show_bug.cgi?id=110674#c83">Comment # 83</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - Crashes / Resets From AMDGPU / Radeon VII"
href="https://bugs.freedesktop.org/show_bug.cgi?id=110674">bug 110674</a>
from <span class="vcard"><a class="email" href="mailto:reddestdream@gmail.com" title="ReddestDream <reddestdream@gmail.com>"> <span class="fn">ReddestDream</span></a>
</span></b>
<pre><span class="quote">> Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and 5.2.7 so the issue is not the value from the dpm table. The dpm table is probably correct. </span >
Fantastic! Glad you tested this. I had suspected the hard_min_level was bogus
and that's why it was failing. Card was rejecting the bogus value. Glad to know
that's not the case.
<span class="quote">> However, what is interesting is that it doesn't always fail.</span >
Yeah. I've had boots where I have my 2 4K DP monitors in and I don't get
powerplay error on boot. In fact, it can go a bit and seem stable. But then the
powerplay errors suddenly (not related to some high load on the card) start
showing up again and the graphics become unstable. Similarly others have
reported that on hotplugging a second monitor after boot, the powerplay errors
will start showing up.
So, maybe there is a timing problem involved with sending the message. It's
generally a question of when rather than if it's going to fail.
<span class="quote">> 1. vega20_set_fclk_to_highest_dpm_level is called twice between the "ring vce2" line and "Initialized"</span >
Is it always called twice? Even on 5.2.7? Because it looks like it might get
called two times right before "Initialized" on 5.0.13 but then only once on
5.2.7 before "Initialized" kicks in. Maybe "Initialized" is interrupting on
5.2.7 but not on 5.0.13. It's possible that Initialization of the card is
messing up values that powerplay needs to read off the card or making the card
unavailable for receiving messages or something . . .
<span class="quote">> So initialization is happening between (and possibly a result of) sending the message and getting the response</span >
Yeah. Something is definitely happening while
vega20_set_uclk_to_highest_dpm_level is running . . . Not 100% sure that's
really problematic tho . . . But it could be an atomicity issue. Need to
figure out what exactly what is generating the line "[drm] Initialized amdgpu
3.27.0 20150101 for 0000:44:00.0 on minor 0." Looks like it's coming from the
drm core rather than amdgpu specifically.
<span class="quote">> I'm going to see if I can disable/revert BACO entirely to at least rule it out.</span >
I thought BACO was reverted for Vega 20 here:
<a href="https://github.com/torvalds/linux/commit/7db329e57b90ddebcb58fc88eedbb3082d22a957#diff-8a4d25be8ad5d9c3ff27bb54b678dab2">https://github.com/torvalds/linux/commit/7db329e57b90ddebcb58fc88eedbb3082d22a957#diff-8a4d25be8ad5d9c3ff27bb54b678dab2</a>
Your commit seems to have been introduced in 5.2-rc1, not 5.1.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>