https://bugzilla.kernel.org/show_bug.cgi?id=212077
Bug ID: 212077 Summary: AMD GPU at highest frequency even not in use Product: Drivers Version: 2.5 Kernel Version: 5.11.3 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: high Priority: P1 Component: Video(DRI - non Intel) Assignee: drivers_video-dri@kernel-bugs.osdl.org Reporter: bat_malin@abv.bg Regression: No
Created attachment 295677 --> https://bugzilla.kernel.org/attachment.cgi?id=295677&action=edit Dmesg
1.240847] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set. [ 1.240850] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set. [ 1.240851] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set. [ 1.240852] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set. [ 1.240853] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set. [ 1.240854] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set. [ 1.240855] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set. [ 1.240856] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set. [ 1.240857] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set. [ 1.240858] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set. Dmesg attached
https://bugzilla.kernel.org/show_bug.cgi?id=212077
Bat Malin (bat_malin@abv.bg) changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|AMD GPU at highest |AMD GPU discrete card |frequency even not in use |memory at highest frequency | |even not in use
https://bugzilla.kernel.org/show_bug.cgi?id=212077
Bat Malin (bat_malin@abv.bg) changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|AMD GPU discrete card |AMD GPU discrete card |memory at highest frequency |memory at highest frequency |even not in use |even while not in use
https://bugzilla.kernel.org/show_bug.cgi?id=212077
--- Comment #1 from Bat Malin (bat_malin@abv.bg) --- Created attachment 295679 --> https://bugzilla.kernel.org/attachment.cgi?id=295679&action=edit Picture of memory status
https://bugzilla.kernel.org/show_bug.cgi?id=212077
Alex Deucher (alexdeucher@gmail.com) changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |alexdeucher@gmail.com
--- Comment #2 from Alex Deucher (alexdeucher@gmail.com) --- Should be fixed with this patch: https://patchwork.freedesktop.org/patch/422999/
https://bugzilla.kernel.org/show_bug.cgi?id=212077
Bat Malin (bat_malin@abv.bg) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution|--- |PATCH_ALREADY_AVAILABLE
--- Comment #3 from Bat Malin (bat_malin@abv.bg) --- Thank you Alex!
https://bugzilla.kernel.org/show_bug.cgi?id=212077
Bat Malin (bat_malin@abv.bg) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|PATCH_ALREADY_AVAILABLE |---
--- Comment #4 from Bat Malin (bat_malin@abv.bg) --- Issue not fixed in kernel 5.11.4
https://bugzilla.kernel.org/show_bug.cgi?id=212077
--- Comment #5 from Bat Malin (bat_malin@abv.bg) --- Issue still present in 5.11.5 1.335057] amdgpu: Clock is not in range of specified clock range for watermark from DAL! Using highest water mark set.
https://bugzilla.kernel.org/show_bug.cgi?id=212077
--- Comment #6 from Bat Malin (bat_malin@abv.bg) --- No change in the code of 5.12-rc2...
for (i = 0; i < dep_mclk_table->count; i++) { for (j = 0; j < dep_sclk_table->count; j++) { valid_entry = false; for (k = 0; k < watermarks->num_wm_sets; k++) { if (dep_sclk_table->entries[i].clk / 10 >= watermarks->wm_clk_ranges[k].wm_min_eng_clk_in_khz && dep_sclk_table->entries[i].clk / 10 < watermarks->wm_clk_ranges[k].wm_max_eng_clk_in_khz && dep_mclk_table->entries[i].clk / 10 >= watermarks->wm_clk_ranges[k].wm_min_mem_clk_in_khz && dep_mclk_table->entries[i].clk / 10 < watermarks->wm_clk_ranges[k].wm_max_mem_clk_in_khz) { valid_entry = true; table->DisplayWatermark[i][j] = watermarks->wm_clk_ranges[k].wm_set_id; break;
https://bugzilla.kernel.org/show_bug.cgi?id=212077
--- Comment #7 from Bat Malin (bat_malin@abv.bg) --- Code not fixed in 5.11.6
https://bugzilla.kernel.org/show_bug.cgi?id=212077
Bat Malin (bat_malin@abv.bg) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |CODE_FIX
--- Comment #8 from Bat Malin (bat_malin@abv.bg) --- Code fixed in 5.11.7 Thank you!
https://bugzilla.kernel.org/show_bug.cgi?id=212077
Bat Malin (bat_malin@abv.bg) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |REOPENED Resolution|CODE_FIX |---
--- Comment #9 from Bat Malin (bat_malin@abv.bg) --- Code fixed but the GPU is still running @highest possible clock
https://bugzilla.kernel.org/show_bug.cgi?id=212077
--- Comment #10 from Bat Malin (bat_malin@abv.bg) --- Created attachment 295905 --> https://bugzilla.kernel.org/attachment.cgi?id=295905&action=edit Picture of memory status (new)
https://bugzilla.kernel.org/show_bug.cgi?id=212077
--- Comment #11 from Bat Malin (bat_malin@abv.bg) --- Created attachment 295907 --> https://bugzilla.kernel.org/attachment.cgi?id=295907&action=edit Dmesg (new)
https://bugzilla.kernel.org/show_bug.cgi?id=212077
--- Comment #12 from Bat Malin (bat_malin@abv.bg) --- Old Kernel e.g. 5.10.23 initializes this 1.038643] [drm] DM_PPLIB: values for Engine clock [ 1.038645] [drm] DM_PPLIB: 214000 [ 1.038646] [drm] DM_PPLIB: 603000 [ 1.038646] [drm] DM_PPLIB: 958000 [ 1.038647] [drm] DM_PPLIB: 1060000 [ 1.038647] [drm] DM_PPLIB: 1128000 [ 1.038647] [drm] DM_PPLIB: 1182000 [ 1.038648] [drm] DM_PPLIB: 1230000 [ 1.038648] [drm] DM_PPLIB: 1275000 [ 1.038649] [drm] DM_PPLIB: Validation clocks: [ 1.038649] [drm] DM_PPLIB: engine_max_clock: 127500 [ 1.038649] [drm] DM_PPLIB: memory_max_clock: 175000 [ 1.038650] [drm] DM_PPLIB: level : 8 [ 1.038651] [drm] DM_PPLIB: values for Memory clock [ 1.038651] [drm] DM_PPLIB: 300000 [ 1.038651] [drm] DM_PPLIB: 625000 [ 1.038652] [drm] DM_PPLIB: 1750000 [ 1.038652] [drm] DM_PPLIB: Validation clocks: [ 1.038652] [drm] DM_PPLIB: engine_max_clock: 127500 [ 1.038653] [drm] DM_PPLIB: memory_max_clock: 175000 [ 1.038653] [drm] DM_PPLIB: level : 8 and for the integrated card- [ 1.469248] [drm] DM_PPLIB: values for F clock [ 1.469250] [drm] DM_PPLIB: 400000 in kHz, 2874 in mV [ 1.469251] [drm] DM_PPLIB: 933000 in kHz, 3224 in mV [ 1.469252] [drm] DM_PPLIB: 1067000 in kHz, 3924 in mV [ 1.469253] [drm] DM_PPLIB: 1200000 in kHz, 4074 in mV [ 1.469256] [drm] DM_PPLIB: values for DCF clock [ 1.469257] [drm] DM_PPLIB: 300000 in kHz, 2874 in mV [ 1.469258] [drm] DM_PPLIB: 600000 in kHz, 3224 in mV [ 1.469259] [drm] DM_PPLIB: 626000 in kHz, 3924 in mV [ 1.469260] [drm] DM_PPLIB: 654000 in kHz, 4074 in mV [ 1.469553] [drm] Display Core initialized with v3.2.104!
The new one 5.11.7 only for integrated card [ 1.992374] kernel: [drm] DM_PPLIB: values for F clock [ 1.992377] kernel: [drm] DM_PPLIB: 400000 in kHz, 2874 in mV [ 1.992379] kernel: [drm] DM_PPLIB: 933000 in kHz, 3224 in mV [ 1.992381] kernel: [drm] DM_PPLIB: 1067000 in kHz, 3924 in mV [ 1.992382] kernel: [drm] DM_PPLIB: 1200000 in kHz, 4074 in mV [ 1.992385] kernel: [drm] DM_PPLIB: values for DCF clock [ 1.992387] kernel: [drm] DM_PPLIB: 300000 in kHz, 2874 in mV [ 1.992388] kernel: [drm] DM_PPLIB: 600000 in kHz, 3224 in mV [ 1.992390] kernel: [drm] DM_PPLIB: 626000 in kHz, 3924 in mV [ 1.992391] kernel: [drm] DM_PPLIB: 654000 in kHz, 4074 in mV So I think this is related as the new kernel driver can`t initialize the values for the discrete card. Please fix.
https://bugzilla.kernel.org/show_bug.cgi?id=212077
--- Comment #13 from Alex Deucher (alexdeucher@gmail.com) --- Created attachment 296035 --> https://bugzilla.kernel.org/attachment.cgi?id=296035&action=edit possible fix
This patch should fix it.
https://bugzilla.kernel.org/show_bug.cgi?id=212077
--- Comment #14 from Bat Malin (bat_malin@abv.bg) --- Thank you Alex for your engagement! Could you please include the patch in the next 5.11.11 release so I could test the patch, sorry but I am not allowed to compile a kernel on this machine.
https://bugzilla.kernel.org/show_bug.cgi?id=212077
Bat Malin (bat_malin@abv.bg) changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |CODE_FIX
--- Comment #15 from Bat Malin (bat_malin@abv.bg) --- Issue fixed in 5.11.12 even now it consumes less power (~1,07W less).
Before:
amdgpu-pci-0100 Adapter: PCI adapter vddgfx: 756.00 mV edge: +35.0 C (crit = +94.0 C, hyst = -273.1 C) power1: 8.14 W (cap = 60.00 W)
After:
amdgpu-pci-0100 Adapter: PCI adapter vddgfx: 756.00 mV edge: +38.0°C (crit = +94.0°C, hyst = -273.1°C) power1: 7.07 W (cap = 60.00 W)
Thank you!
https://bugzilla.kernel.org/show_bug.cgi?id=212077
--- Comment #16 from Bat Malin (bat_malin@abv.bg) --- After reboot even better - amdgpu-pci-0100 Adapter: PCI adapter vddgfx: 756.00 mV edge: +35.0°C (crit = +94.0°C, hyst = -273.1°C) power1: 6.22 W (cap = 60.00 W)
dri-devel@lists.freedesktop.org