[Bug 110674] Crashes / Resets From AMDGPU / Radeon VII

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Aug 10 16:39:55 UTC 2019


https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #67 from Tom B <tom at r.je> ---
I had a look around at similar bugs and came across this:

https://bugs.freedesktop.org/show_bug.cgi?id=110822

It's for a 580, not a VII but the problems started at 5.1 and gives a similar
powerplay related crash.

The suggested fix there is to revert ad51c46eec739c18be24178a30b47801b10e0357.

I just tried this and after 4 reboots I can report it has two effects:

1. I don't have any crashing at all and my card boosts GPU clocks, voltages and
wattages. I can run unigine-heaven for several minutes without the system
freezing.

2. The memory is forced to 351mhz, limiting performance.

If I run 

cat /sys/class/drm/card0/device/pp_dpm_mclk 

it shows:

0: 351Mhz *
1: 801Mhz 
2: 1001Mhz 


Which looks correct for idle, but it never, even under load, boosts to the next
memory clock. It also can't be set manually:


echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo 2 >  /sys/class/drm/card0/device/pp_dpm_mclk
-bash: echo: write error: Invalid argument


While this isn't a proper fix it does give us some valuable insight. If anyone
wants to run at 351mhz memory with a stable card and 2 screens they can. It
would be nice if someone can verify my findings as my card seemed to behave
differently to others for some reason.

This bug may be related to https://bugs.freedesktop.org/show_bug.cgi?id=110822
alternatively, it's possible the crash occurs when the memory clock changes
(which might mean it's related to
https://bugs.freedesktop.org/show_bug.cgi?id=102646 as there are issues with
memory clock changes there) There seem to be several powerplay related issues
which may have the same root cause.


I'm now going to:

1. Revert to the stock kernel and set the mclk to 1001 manually before starting
SDDM and see if the crash occurs.

2. See if I can manage to get stability and the mclk stuck at 1001mhz as this
would be an acceptable compromise, even if not ideal.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190810/961cf2e9/attachment-0001.html>


More information about the dri-devel mailing list