<html>
    <head>
      <base href="https://bugs.freedesktop.org/">
    </head>
    <body>
      <p>
        <div>
            <b><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Crashes / Resets From AMDGPU / Radeon VII"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=110674#c81">Comment # 81</a>
              on <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Crashes / Resets From AMDGPU / Radeon VII"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=110674">bug 110674</a>
              from <span class="vcard"><a class="email" href="mailto:tom@r.je" title="Tom B <tom@r.je>"> <span class="fn">Tom B</span></a>
</span></b>
        <pre>Created <span class=""><a href="attachment.cgi?id=145038" name="attach_145038" title="5.2.7 dmesg with hard_min_level logged">attachment 145038</a> <a href="attachment.cgi?id=145038&action=edit" title="5.2.7 dmesg with hard_min_level logged">[details]</a></span>
5.2.7 dmesg with hard_min_level logged

As mentioned in the previous post, I started logging the value of
hard_min_level. I hadn't realised that vega20_set_uclk_to_highest_dpm_level
would be called so many times.

Here's what I found: The value of hard_min_level is 1001 in both 5.0.13 and
5.2.7 so the issue is not the value from the dpm table. The dpm table is
probably correct. Something prevents smum_send_msg_to_smc_with_parameter
accepting the value.

However, what is interesting is that it doesn't always fail.


[    4.082105] amdgpu: [powerplay] hard_min_level: 1001
[    4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[    4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!





Each hard_min_level line in the log is from
vega20_set_uclk_to_highest_dpm_level and there are multiple calls to it, which
don't fail, before the card is initialised.


This is from 5.2.7:

[    3.698907] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    4.082105] amdgpu: [powerplay] hard_min_level: 1001
[    4.372684] [drm] Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on
minor 0
[    4.517204] amdgpu: [powerplay] Failed to send message 0x28, response 0x0
[    4.517205] amdgpu: [powerplay] [SetUclkToHightestDpmLevel] Set hard min
uclk failed!
[    5.361482] amdgpu: [powerplay] Failed to send message 0x28, response 0x0


And the same from 5.0.13:

[    3.352380] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
[    3.722422] amdgpu: [powerplay] hard_min_level: 1001
[    3.766269] amdgpu: [powerplay] hard_min_level: 1001
[    4.029679] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:44:00.0 on
minor 0


There are a couple of things here:

1. vega20_set_fclk_to_highest_dpm_level is called twice between the "ring vce2"
line and "Initialized"

2. My patched code looks like this:

                pr_err("hard_min_level: %d\n",
                                        dpm_table->dpm_state.hard_min_level);

                PP_ASSERT_WITH_CODE(!(ret =
smum_send_msg_to_smc_with_parameter(hwmgr,
                                PPSMC_MSG_SetHardMinByFreq,
                                (PPCLK_UCLK << 16 ) |
dpm_table->dpm_state.hard_min_level)),
                                "[SetUclkToHightestDpmLevel] Set hard min uclk
failed!",
                                return ret);

Yet the log shows:

- My debug line 
- Initialized amdgpu 3.32.0 20150101 for 0000:44:00.0 on minor 0
- [SetUclkToHightestDpmLevel] Set hard min uclk failed!

So initialization is happening between (and possibly a result of) sending the
message and getting the response.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>