[Bug 110674] Crashes / Resets From AMDGPU / Radeon VII
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Wed Aug 14 17:30:55 UTC 2019
https://bugs.freedesktop.org/show_bug.cgi?id=110674
--- Comment #100 from Tom B <tom at r.je> ---
I've bee trying to work backwards to find the place where screens get
initialised and eventually call vega20_pre_display_configuration_changed_task.
vega20_pre_display_configuration_changed_task is exported as
pp_hwmgr_func::display_config_changed
Which is called form hardwaremanager.c:phm_pre_display_configuration_changed
phm_pre_display_configuration_changed is called from
hwmghr.c:hwmgr_handle_task:
switch (task_id) {
case AMD_PP_TASK_DISPLAY_CONFIG_CHANGE:
ret = phm_pre_display_configuration_changed(hwmgr);
pp_dpm_dispatch_tasks is exported as amd_pm_funcs::dispatch_tasks is called
from amdgpu_dpm_dispatch_task which is called in amdgpu_pm.c:
void amdgpu_pm_compute_clocks(struct amdgpu_device *adev)
{
int i = 0;
if (!adev->pm.dpm_enabled)
return;
if (adev->mode_info.num_crtc)
amdgpu_display_bandwidth_update(adev);
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];
if (ring && ring->sched.ready)
amdgpu_fence_wait_empty(ring);
}
if (is_support_sw_smu(adev)) {
struct smu_context *smu = &adev->smu;
struct smu_dpm_context *smu_dpm = &adev->smu.smu_dpm;
mutex_lock(&(smu->mutex));
smu_handle_task(&adev->smu,
smu_dpm->dpm_level,
AMD_PP_TASK_DISPLAY_CONFIG_CHANGE);
mutex_unlock(&(smu->mutex));
} else {
if (adev->powerplay.pp_funcs->dispatch_tasks) {
if (!amdgpu_device_has_dc_support(adev)) {
mutex_lock(&adev->pm.mutex);
amdgpu_dpm_get_active_displays(adev);
adev->pm.pm_display_cfg.num_display =
adev->pm.dpm.new_active_crtc_count;
adev->pm.pm_display_cfg.vrefresh =
amdgpu_dpm_get_vrefresh(adev);
adev->pm.pm_display_cfg.min_vblank_time =
amdgpu_dpm_get_vblank_time(adev);
/* we have issues with mclk switching with
refresh rates over 120 hz on the non-DC code. */
if (adev->pm.pm_display_cfg.vrefresh > 120)
adev->pm.pm_display_cfg.min_vblank_time
= 0;
if
(adev->powerplay.pp_funcs->display_configuration_change)
adev->powerplay.pp_funcs->display_configuration_change(
adev->powerplay.pp_handle,
&adev->pm.pm_display_cfg);
mutex_unlock(&adev->pm.mutex);
}
amdgpu_dpm_dispatch_task(adev,
AMD_PP_TASK_DISPLAY_CONFIG_CHANGE, NULL);
} else {
mutex_lock(&adev->pm.mutex);
amdgpu_dpm_get_active_displays(adev);
amdgpu_dpm_change_power_state_locked(adev);
mutex_unlock(&adev->pm.mutex);
}
}
}
This is the only place I can see AMD_PP_TASK_DISPLAY_CONFIG_CHANGE being called
from, which eventually is where vega20_pre_display_configuration_changed_task
gets called.
Presumably the code:
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];
if (ring && ring->sched.ready)
amdgpu_fence_wait_empty(ring);
}
is what generates
[ 3.683718] amdgpu 0000:44:00.0: ring gfx uses VM inv eng 0 on hub 0
[ 3.683719] amdgpu 0000:44:00.0: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 3.683720] amdgpu 0000:44:00.0: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 3.683720] amdgpu 0000:44:00.0: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 3.683721] amdgpu 0000:44:00.0: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 3.683722] amdgpu 0000:44:00.0: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 3.683722] amdgpu 0000:44:00.0: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 3.683723] amdgpu 0000:44:00.0: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 3.683724] amdgpu 0000:44:00.0: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 3.683724] amdgpu 0000:44:00.0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 3.683725] amdgpu 0000:44:00.0: ring sdma0 uses VM inv eng 0 on hub 1
[ 3.683726] amdgpu 0000:44:00.0: ring page0 uses VM inv eng 1 on hub 1
[ 3.683726] amdgpu 0000:44:00.0: ring sdma1 uses VM inv eng 4 on hub 1
[ 3.683727] amdgpu 0000:44:00.0: ring page1 uses VM inv eng 5 on hub 1
[ 3.683728] amdgpu 0000:44:00.0: ring uvd_0 uses VM inv eng 6 on hub 1
[ 3.683728] amdgpu 0000:44:00.0: ring uvd_enc_0.0 uses VM inv eng 7 on hub 1
[ 3.683729] amdgpu 0000:44:00.0: ring uvd_enc_0.1 uses VM inv eng 8 on hub 1
[ 3.683730] amdgpu 0000:44:00.0: ring uvd_1 uses VM inv eng 9 on hub 1
[ 3.683730] amdgpu 0000:44:00.0: ring uvd_enc_1.0 uses VM inv eng 10 on hub
1
[ 3.683731] amdgpu 0000:44:00.0: ring uvd_enc_1.1 uses VM inv eng 11 on hub
1
[ 3.683731] amdgpu 0000:44:00.0: ring vce0 uses VM inv eng 12 on hub 1
[ 3.683732] amdgpu 0000:44:00.0: ring vce1 uses VM inv eng 13 on hub 1
[ 3.683733] amdgpu 0000:44:00.0: ring vce2 uses VM inv eng 14 on hub 1
In dmesg. I'll add a pr_err() to verify this. If so, it means our issue is
introduced somewhere between that for loop and amdgpu_dpm_dispatch_task in this
function.
amdgpu_pm_compute_clocks is called from
amdgpu_dm_pp_smu.c:dm_pp_apply_display_requirements which is called in
dce_clk_mgr.c in two places: dce_pplib_apply_display_requirements and
dce11_pplib_apply_display_requirements. I don't know which is used for the VII,
I'll add some logging to verify.
But here's something that may be relevant to this bug. In
dce11_pplib_apply_display_requirements there's a check for the number of
displays:
/* TODO: is this still applicable?*/
if (pp_display_cfg->display_count == 1) {
const struct dc_crtc_timing *timing =
&context->streams[0]->timing;
pp_display_cfg->crtc_index =
pp_display_cfg->disp_configs[0].pipe_idx;
pp_display_cfg->line_time_in_us = timing->h_total * 10000 /
timing->pix_clk_100hz;
}
So there's something that is different when mroe than one display is connected.
That's as far as I got walking backwards through the code. I'll note that this
was also present in 5.0.1, but it could be that something is relying on
ctrc_inxex or line_time_in_us, which wasn't checked previously as these values
only appear to be set if there is a single display.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20190814/c03f3cc4/attachment-0001.html>
More information about the dri-devel
mailing list