amdgpu multi monitor - clock, heat and power problem
Alex Deucher
alexdeucher at gmail.com
Mon Apr 8 22:58:02 UTC 2019
On Mon, Apr 8, 2019 at 6:50 PM Rigo Reddig <rigo.reddig at gmail.com> wrote:
>
> I have 2 Gigabyte RX580's in my desktop workstation.
>
> I'm running Arch Linux with KDE Plasma on the 5.0.6 kernel.
>
>
>
> The cards themselves work fine, except,
>
> I have two 1080p HDMI monitors plugged into one of these cards.
>
> One in a native HDMI port, one in a passive DVI->HDMI adapter.
>
>
>
> This causes the following problem for idle usage:
>
>
>
> 1. Memory clock is effectively locked at 200Mhz always
>
> 2. Core clock is constantly at high frequency P-state
>
> 3. Temperatures are increased
>
> 4. Power consumption is increased (significantly)
>
> 5. PCI bus is always at full speed
>
> 6. Forcing core clock to 300Mhz, uses a higher than usual voltage
>
>
>
> Below is an excerpt from the rocm-smi utility for the automatic defaults
>
> (I have omitted overclock and powercap values for formatting purposes)
>
>
>
>
>
> 2 Monitors connected to GPU 0, No monitors connected to GPU 1
>
> ROCm System Management Interface
>
> ===============================================================================
>
> GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU%
>
> 0 44.0c 36.193W 1145Mhz 2000Mhz 8.0GT/s, x16 40.0% auto 0%
>
> 1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0%
>
> ===============================================================================
>
> End of ROCm SMI Log
>
>
>
> GPU 0 is idle and yet running SCLK and MCLK at unnecessary power levels
>
> GPU 1 is truly idle
>
> Regarding GPU 0 temperature, I have actually setup a daemon to run the fan at a consistent rate to prevent it from constantly peaking.
>
>
>
> -------------------------------------------------------------------------------
>
>
>
> 1 Monitors connected to GPU 0, No monitors connected to GPU 1
>
> ROCm System Management Interface
>
> ===============================================================================
>
> GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU%
>
> 0 36.0c 28.103W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0%
>
> 1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% auto 0%
>
> ===============================================================================
>
>
>
> 2 Monitors connected to GPU 0, No monitors connected to GPU 1
>
>
>
> 2 Monitors connected to GPU 0, No monitors connected to GPU 1
>
> ROCm System Management Interface
>
> ===============================================================================
>
> GPU Temp AvgPwr SCLK MCLK PCLK Fan Perf GPU%
>
> 0 44.0c 31.086W 300Mhz 2000Mhz 2.5GT/s, x8 40.0% low 0%
>
> 1 37.0c 28.104W 300Mhz 300Mhz 2.5GT/s, x8 0.0% low 0%
>
> ===============================================================================
>
>
>
> Peculiarly even with low power state forced, the GPU runs at a voltage (950mV) in excess of what is required for 300Mhz (750mV)
>
>
>
>
>
> ===============================================================================
>
> cat /sys/kernel/debug/dri/0/amdgpu_pm_info jupiter: Mon Apr 8 21:57:29 2019
>
>
>
> Clock Gating Flags Mask: 0x3fbcf
>
> Graphics Medium Grain Clock Gating: On
>
> Graphics Medium Grain memory Light Sleep: On
>
> Graphics Coarse Grain Clock Gating: On
>
> Graphics Coarse Grain memory Light Sleep: On
>
> Graphics Coarse Grain Tree Shader Clock Gating: Off
>
> Graphics Coarse Grain Tree Shader Light Sleep: Off
>
> Graphics Command Processor Light Sleep: On
>
> Graphics Run List Controller Light Sleep: On
>
> Graphics 3D Coarse Grain Clock Gating: Off
>
> Graphics 3D Coarse Grain memory Light Sleep: Off
>
> Memory Controller Light Sleep: On
>
> Memory Controller Medium Grain Clock Gating: On
>
> System Direct Memory Access Light Sleep: Off
>
> System Direct Memory Access Medium Grain Clock Gating: On
>
> Bus Interface Medium Grain Clock Gating: Off
>
> Bus Interface Light Sleep: On
>
> Unified Video Decoder Medium Grain Clock Gating: On
>
> Video Compression Engine Medium Grain Clock Gating: On
>
> Host Data Path Light Sleep: On
>
> Host Data Path Medium Grain Clock Gating: On
>
> Digital Right Management Medium Grain Clock Gating: Off
>
> Digital Right Management Light Sleep: Off
>
> Rom Medium Grain Clock Gating: On
>
> Data Fabric Medium Grain Clock Gating: Off
>
>
>
> GFX Clocks and Power:
>
> 2000 MHz (MCLK)
>
> 300 MHz (SCLK)
>
> 600 MHz (PSTATE_SCLK)
>
> 1000 MHz (PSTATE_MCLK)
>
> 950 mV (VDDGFX)
>
> 31.14 W (average GPU)
>
>
>
> GPU Temperature: 43 C
>
> GPU Load: 0 %
>
>
>
> UVD: Disabled
>
>
>
> VCE: Disabled
>
> ===============================================================================
>
>
>
>
>
> Using amdgpu.ppfeaturemask=0xffffffff I am able to work around all of the above issues, but requires me to manually set idle and performance clock speeds as required. 300mhz memory and core drive 2 HDMI 1080p displays just fine.
>
> But this leads to screen tearing/green visible artefacting when *changing* core clock speeds. Like there is a synchronization issue. But when running at a fixed speed, all is well.
>
>
>
> The temperatures alone show that power is being wasted.
>
>
>
> I have a UPS that can reasonably accurately (16W steps) measure system power consumption. At idle with default settings letting the kernel and gpu's deal with things themselves I sometimes read ~196W idle power!
>
>
>
> 2 Monitors (auto) -> 196W Idle
>
> 2 Monitors (low) -> 160W Idle
>
> 2 Monitors (Force 300) -> 112-128W Idle
>
> 1 monitor -> 96-128W Idle
>
>
>
> Even if my UPS isn't giving the exact true values, that delta is concerning.
>
>
>
> It is a longstanding issue which has been bugging me for a while now.
> I'm not sure if it's come up yet or why this has been going on for so long.
>
> But it should really be fixed as the issue carries a quite large associated thermal and power burden.
>
>
>
> I have tried poking through the source code to figure this out, but no luck. Have I missed something? Is there a problem synchronizing display VSYNC on clock changes? Why is this happening? It's clearly not the right behaviour.
>
>
>
> What can be done to fix this? Can I help?
When multiple monitors are active, mclk dpm is disabled and the mclk
is set to the highest. This is because mclk switching has to happen
during vblank to avoid artifacts and flickering on the display when it
happens. With multiple monitors, the vblank periods don't necessarily
overlap so mclk cannot be switched with out flickering or artifacts.
Sclk dpm should still work however and should go to the lowest sclk
state when the GPU is idle even with multiple monitors.
Alex
More information about the amd-gfx
mailing list