[Bug 202043] amdgpu: Vega 56 SCLK drops to 700 Mhz when undervolting

bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org
Mon Feb 8 18:22:09 UTC 2021


https://bugzilla.kernel.org/show_bug.cgi?id=202043

Bruno Jacquet (maxijac at free.fr) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maxijac at free.fr

--- Comment #10 from Bruno Jacquet (maxijac at free.fr) ---
I am seeing the same behaviors with vega 64 on 5.4.96 (LTS) and also newer
kernels.

>From my testing, it seems that amdgpu is simply not able to properly apply the
od table to the GPU. As soon as a clock change or voltage change is sent to
GPU, it disturbs the PM and it can only be fixed by rebooting.

(In reply to mistarzy from comment #6)
> Same issue here with Vega 64. From watching
> /sys/kernel/debug/dri/0/amdgpu_pm_info my conclusion is that basically
> driver gets max voltage applied even though tables in
> /sys/class/drm/card0/device/pp_od_clk_voltage suggest otherwise.

>From my testing, I'd say that the voltages are just broken once a change to od
is sent to GPU.
After booting, if you monitor amdgpu_pm_info you will see some "uneven" VDD
values (825, 831, 918mV, etc...) which lets me think some kind of curve is
applied between voltage values of the initial sclk table.
Once any single change to the od table is sent, you will see that now the VDD
steps are just big chunks of the VDD steps, like incrementing by 50mV each up
to maximum value (1000, 1050, 1100, 1200mV...)

It seems using the voltage curve feature of pp_od_clk_voltage ("vc") _could_
fix the issue but it is not supported on cards older than VEGA20...
Not sure if this is a SW limitation in the driver or a GPU limitation.

Unexpectedly, the same effect can be seen when sending a full PP table.


(In reply to haro41 from comment #9)
> https://bugzilla.kernel.org/show_bug.cgi?id=205277
> 
> Should be fixed 5.4 release.

No, it's not fixed.

(In reply to mistarzy from comment #7)
> Best results performance wise had with following setup:
> 
> echo 275000000 >> /sys/class/drm/card0/device/hwmon/hwmon0/power1_cap
> echo "m 3 1100 1000" > /sys/class/drm/card0/device/pp_od_clk_voltage

Caution, if just changing the MCLK without committing (sending "c") it seems
the change is not actually sent to GPU even though all the other tables and
info files report the updated value.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.


More information about the dri-devel mailing list