[Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Mon Aug 12 14:34:52 UTC 2019
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #81 from Tom B <tom at r.je> ---
Created attachment 145038
--> https://bugs.freedesktop.org/attachment.cgi?id=145038&action=edit
5.2.7 dmesg with hard_min_level logged
As mentioned in the previous post, I started logging the value of
hard_min_level. I hadn't realised that vega20_set_uclk_to_highest_dpm_level
would be called so many times.
Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and
5.2.7 so the issue is not the value from the dpm table. The dpm table is
probably correct. Something prevents smum_send_msg_to_smc_with_parameter
accepting the value.
However, what is interesting is that it doesn't always fail.
[ 4.082105] amdgpu: [powerplay] hard_min_level: 1001
[ 4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[ 4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[ 4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
Each hard_min_level line in the log is from
vega20_set_uclk_to_highest_dpm_level and there are multiple calls to it, which
don't fail, before the card is initialised.
This is from 5.2.7:
[ 3.698907] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[ 4.082105] amdgpu: [powerplay] hard_min_level: 1001
[ 4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[ 4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[ 4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
[ 5.361482] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
And the same from 5.0.13:
[ 3.352380] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[ 3.722422] amdgpu: [powerplay] hard_min_level: 1001
[ 3.766269] amdgpu: [powerplay] hard_min_level: 1001
[ 4.029679] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on
minor 0
There are a couple of things here:
1. vega20_set_fclk_to_highest_dpm_level is called twice between the "ring vce2"
line and "Initialized"
2. My patched code looks like this:
pr_err("hard_min_level: %d\n",
dpm_table->dpm_state.hard_min_level);
PP_ASSERT_WITH_CODE(!(ret =
smum_send_msg_to_smc_with_parameter(hwmgr,
PPSMC_MSG_SetHardMinByFreq,
(PPCLK_UCLK << 16 ) |
dpm_table->dpm_state.hard_min_level)),
"[SetUclkToHightestDpmLevel] Set hard min uclk
failed!",
return ret);
Yet the log shows:
- My debug line
- Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on minor 0
- [SetUclkToHightestDpmLevel] Set hard min uclk failed!
So initialization is happening between (and possibly a result of) sending the
message and getting the response.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190812/4a2d9fba/attachment.html>
More information about the dri-devel
mailing list