Power limit OD stopped working for navi10 - broken on previously working commit

Matt Coffin mcoffin13 at gmail.com
Mon Feb 10 02:14:11 UTC 2020


Sorry for the followup, but I did finally manage to track this down to a
firmware/driver incompatibility and bisected `linux-firmware` to find
when it broke.

Since the firmware is just binaries, I can't really tell ya what is
wrong, but this is the commit where writing to the sysfs interface (and
in general sending the SetPptPowerLimit message to the SMC) stopped
doing anything.

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=af76fd0ed266440ac406d5737218af7ac3cfc750

Let me know what I can do to help get this fixed. For now, I've just
downgraded to the first-released microcode as a stop-gap.

On 2/9/20 2:13 PM, Matt Coffin wrote:
> I was doing some benchmarking, and noticed some poor performance,
> indicating that my overdrive settings were not in place, which they
> were. hwmon/power1_cap reports the correctly adjusted value after it is
> written to, and I confirmed with a quick patch that the updated power
> limit value is actually being returned from the SMU after it is set, yet
> the card refuses to go over stock settings (+/- 3% of stock power draw,
> even with a 50% increase in power limit).
> 
> Since I worked on that code a while back, I went to go bisect, using
> c39f062e881dcc6ab4c1c1c5835dc774be1bcfd6 as a starting location, since I
> know that commit had working power limit overdrive before.
> 
> Strangely, I'm seeing the same behavior on that
> previously-known-to-be-working commit!
> 
> This happens for both *increased* and *decreased* power limits. sysfs
> reflects the change, but I see no change in the actual power draw on the
> card, and for the *increased* case, performance reflects a card that is
> throttling due to power limits.
> 
> Were there any firmware changes or anything that could be causing this
> since I don't know where to start since a previously-working commit is
> now somehow broken.
> 
> Since the behavior seems to have changed on me, it would also be
> incredibly helpful if anyone can either confirm or deny that they can
> reproduce this problem (or not) off of the latest codebase OR
> c39f062e881dcc6ab4c1c1c5835dc774be1bcfd6.
> 
> Any help, testing information, or simple confirm/deny from your side
> would go a long way.
> 
> Thanks in advance,
> Matt
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20200209/273595a9/attachment.sig>


More information about the amd-gfx mailing list