Kernel 6.7+ broke under-powering of my RX 6700XT. (Archlinux, mesa/amdgpu)
Alex Deucher
alexdeucher at gmail.com
Tue Feb 20 15:46:00 UTC 2024
On Tue, Feb 20, 2024 at 10:42 AM Linux regression tracking (Thorsten
Leemhuis) <regressions at leemhuis.info> wrote:
>
>
>
> On 20.02.24 16:27, Hans de Goede wrote:
> > Hi,
> >
> > On 2/20/24 16:15, Alex Deucher wrote:
> >> On Tue, Feb 20, 2024 at 10:03 AM Linux regression tracking (Thorsten
> >> Leemhuis) <regressions at leemhuis.info> wrote:
> >>>
> >>> On 20.02.24 15:45, Alex Deucher wrote:
> >>>> On Mon, Feb 19, 2024 at 9:47 AM Linux regression tracking (Thorsten
> >>>> Leemhuis) <regressions at leemhuis.info> wrote:
> >>>>>
> >>>>> On 17.02.24 14:30, Greg KH wrote:
> >>>>>> On Sat, Feb 17, 2024 at 02:01:54PM +0100, Roman Benes wrote:
> >>>>>>> Minimum power limit on latest(6.7+) kernels is 190W for my GPU (RX 6700XT,
> >>>>>>> mesa, archlinux) and I cannot get power cap as low as before(to 115W),
> >>>>>>> neither with Corectrl, LACT or TuxClocker and /sys have a variable read-only
> >>>>>>> even for root. This is not of above apps issue but of the kernel, I read
> >>>>>>> similar issues from other bug reports of above apps. I downgraded to v6.6.10
> >>>>>>> kernel and my 115W(under power)cap work again as before.
> >>>>>>
> >>>>> For the record and everyone that lands here: the cause is known now
> >>>>> (it's 1958946858a62b ("drm/amd/pm: Support for getting power1_cap_min
> >>>>> value") [v6.7-rc1]) and the issue afaics tracked here:
> >>>>>
> >>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3183
> >>>>>
> >>>>> Other mentions:
> >>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3137
> >>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/2992
> >>>>>
> >>>>> Haven't seen any statement from the amdgpu developers (now CCed) yet on
> >>>>> this there (but might have missed something!). From what I can see I
> >>>>> assume this will likely be somewhat tricky to handle, as a revert
> >>>>> overall might be a bad idea here. We'll see I guess.
> >>>>
> >>>> The change aligns the driver what has been validated on each board
> >>>> design. Windows uses the same limits. Using values lower than the
> >>>> validated range can lead to undefined behavior and could potentially
> >>>> damage your hardware.
> >>>
> >>> Thx for the reply! Yeah, I was expecting something along those lines.
> >>>
> >>> Nevertheless it afaics still is a regression in the eyes of many users.
> >>> I'm not sure how Linus feels about this, but I wonder if we can find
> >>> some solution here so that users that really want to, can continue to do
> >>> what was possible out-of-the box before. Is that possible to realize or
> >>> even supported already?
> >>>
> >>> And sure, those users would be running their hardware outside of its
> >>> specifications. But is that different from overclocking (which the
> >>> driver allows, doesn't it? If not by all means please correct me!)?
> >>
> >> Sure. The driver has always had upper bound limits for overclocking,
> >> this change adds lower bounds checking for underclocking as well.
> >> When the silicon validation teams set the bounding box for a device,
> >> they set a range of values where it's reasonable to operate based on
> >> the characteristics of the design.
> >>
> >> If we did want to allow extended underclocking, we need a big warning
> >> in the logs at the very least.
> >
> > Requiring a module-option to be set to allow this, as well as a big
> > warning in the logs sounds like a good solution to me.
>
> Yeah, especially as it sounds from some of the reports as if some
> vendors did a really bad job when it came to setting the proper
> lower-bound limits are now adhered -- and thus higher then what we used
> out-of-the box before 1958946858a62b was applied.
>
> Side note: I assume those "lower bounds checking" is done round about
> the same way by the Windows driver? Does that one allow users to go
> lower somehow? Say after modifying the registry or something like that?
> Or through external tools?
Windows uses the same limit. I'm not aware of any way to override the
limit on windows off hand.
Alex
>
> Ciao, Thorsten
>
> >>>>> Roman posted something that apparently was meant to go to the list, so
> >>>>> let me put it here:
> >>>>>
> >>>>> """
> >>>>> UPDATE: User fililip already posted patch, but it need to be merged,
> >>>>> discussion is on gitlab link below.
> >>>>>
> >>>>> (PS: I hope I am replying correctly to "all" now? - using original addr.)
> >>>>>
> >>>>>
> >>>>>> it seems that commit was already found(see user's 'fililip' comment):
> >>>>>>
> >>>>>> https://gitlab.freedesktop.org/drm/amd/-/issues/3183
> >>>>>> commit 1958946858a62b6b5392ed075aa219d199bcae39
> >>>>>> Author: Ma Jun <Jun.Ma2 at amd.com>
> >>>>>> Date: Thu Oct 12 09:33:45 2023 +0800
> >>>>>>
> >>>>>> drm/amd/pm: Support for getting power1_cap_min value
> >>>>>>
> >>>>>> Support for getting power1_cap_min value on smu13 and smu11.
> >>>>>> For other Asics, we still use 0 as the default value.
> >>>>>>
> >>>>>> Signed-off-by: Ma Jun <Jun.Ma2 at amd.com>
> >>>>>> Reviewed-by: Kenneth Feng <kenneth.feng at amd.com>
> >>>>>> Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
> >>>>>>
> >>>>>> However, this is not good as it remove under-powering range too far. I
> >>>>> was getting only about 7% less performance but 90W(!) less consumption
> >>>>> when set to my 115W before. Also I wonder if we as a OS of options and
> >>>>> freedom have to stick to such very high reference for min values without
> >>>>> ability to override them through some sys ctrls. Commit was done by amd
> >>>>> guy and I wonder if because of maybe this post that I made few months
> >>>>> ago(business strategy?):
> >>>>>>
> >>>>>>
> >>>>> https://www.reddit.com/r/Amd/comments/183gye7/rx_6700xt_from_230w_to_capped_115w_at_only_10/
> >>>>>>
> >>>>>> This is not a dangerous OC upwards where I can understand desire to
> >>>>> protect HW, it is downward, having min cap at 190W when card pull on
> >>>>> 115W almost same speed is IMO crazy to deny. We don't talk about default
> >>>>> or reference values here either, just a move to lower the range of
> >>>>> options for whatever reason.
> >>>>>>
> >>>>>> I don't know how much power you guys have over them, but please
> >>>>> consider either reverting this change, or give us an option to set
> >>>>> min_cap through say /sys (right now param is readonly, even for root).
> >>>>>>
> >>>>>>
> >>>>>> Thank you in advance for looking into this, with regards: Romano
> >>>>> """
> >>>>>
> >>>>> And while at it, let me add this issue to the tracking as well
> >>>>>
> >>>>> [TLDR: I'm adding this report to the list of tracked Linux kernel
> >>>>> regressions; the text you find below is based on a few templates
> >>>>> paragraphs you might have encountered already in similar form.
> >>>>> See link in footer if these mails annoy you.]
> >>>>>
> >>>>> Thanks for the report. To be sure the issue doesn't fall through the
> >>>>> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> >>>>> tracking bot:
> >>>>>
> >>>>> #regzbot introduced 1958946858a62b /
> >>>>> #regzbot title drm: amdgpu: under-powering broke
> >>>>>
> >>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> >>>>> --
> >>>>> Everything you wanna know about Linux kernel regression tracking:
> >>>>> https://linux-regtracking.leemhuis.info/about/#tldr
> >>>>> That page also explains what to do if mails like this annoy you.
> >>>>
> >>>>
> >>
> >
> >
> >
More information about the amd-gfx
mailing list