[Bug 110674] Crashes / Resets From AMDGPU / Radeon VII

Sun Aug 11 18:43:48 UTC 2019

https://bugs.freedesktop.org/show_bug.cgi?id=110674

--- Comment #72 from Tom B <tom at r.je> ---
> The nasty displayport mst thingy? I would always set this to false.

I don't believe mst is being used here, it's two monitors both with separate
cables.


Here's some additional investigation.

[SetUclkToHightestDpmLevel] Set hard min uclk failed! Appears as one of the
first errors in dmesg. This is from vega20_hwmgr.c:3354 and triggered by:


                PP_ASSERT_WITH_CODE(!(ret =
smum_send_msg_to_smc_with_parameter(hwmgr,
                                PPSMC_MSG_SetHardMinByFreq,
                                (PPCLK_UCLK << 16 ) |
dpm_table->dpm_state.hard_min_level)),
                                "[SetUclkToHightestDpmLevel] Set hard min uclk
failed!",
                                return ret);




hard_min_level is adjusted if disable_mclk_switching is set on line 3497.


        disable_mclk_switching = ((1 < hwmgr->display_config->num_display) &&
                           !hwmgr->display_config->multi_monitor_in_sync) ||
                            vblank_too_short;


        /* Hardmin is dependent on displayconfig */
        if (disable_mclk_switching) {
                dpm_table->dpm_state.hard_min_level =
dpm_table->dpm_levels[dpm_table->count - 1].value;
                for (i = 0; i < data->mclk_latency_table.count - 1; i++) {
                        if (data->mclk_latency_table.entries[i].latency <=
latency) {
                                if (dpm_table->dpm_levels[i].value >=
(hwmgr->display_config->min_mem_set_clock / 100)) {
                                        dpm_table->dpm_state.hard_min_level =
dpm_table->dpm_levels[i].value;
                                        break;
                                }
                        }
                }
        }


Interestingly, this also checks for the presence of multiple displays so we at
least have a connection between the code, error message and cause of the bug
(multiple displays). As a very crude test, I tried forcing it on and compiling
with

disable_mclk_switching = true;

No difference, so I also tried:

disable_mclk_switching = false;

Again, it didn't help. I will note that this code is identical in 5.0.13 so my
test was really only checking for an incorrect value being set elsewhere in
hwmgr->display_config->multi_monitor_in_sync or 
hwmgr->display_config->num_display. In 5.0.13 I do get mclk boosting, It idles
at 351mhz and boosts to 1001mhz so I don't think that forcing the memory to max
clock all the time is the correct solution.


I also diff'd vega20_hwmgr.c from 5.0.13 and 5.2.7  (I'll attach it). Here's a
few things I noticed:


in vega20_init_smc_table, this line has been added in this commit
https://github.com/torvalds/linux/commit/f5e79735cab448981e245a41ee6cbebf0e334f61
: 

+       data->vbios_boot_state.fclock = boot_up_values.ulFClk;

I don't know what fclock is, but this was never set in 5.0.13.


in vega20_setup_default_dpm_tables:

@@ -710,8 +729,10 @@ static int vega20_setup_default_dpm_tables(struct pp_hwmgr
*hwmgr)
                PP_ASSERT_WITH_CODE(!ret,
                                "[SetupDefaultDpmTable] failed to get fclk dpm
levels!",
                                return ret);
-       } else
-               dpm_table->count = 0;
+       } else {
+               dpm_table->count = 1;
+               dpm_table->dpm_levels[0].value = data->vbios_boot_state.fclock
/ 100;
+       }


in 5.0.13, dpm_table->count is set to 0, in 5.2.7 it's set and a dpm_level
added based on fclock. fclock appears throughout as a new addition. I don't
think this is the cause, but the addition of fclock may be worth exploring.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190811/ca9b5624/attachment.html>