<html> <head> <base href="https://bugs.freedesktop.org/"> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - Crashes / Resets From AMDGPU / Radeon VII" href="https://bugs.freedesktop.org/show_bug.cgi?id=110674#c94">Comment # 94</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - Crashes / Resets From AMDGPU / Radeon VII" href="https://bugs.freedesktop.org/show_bug.cgi?id=110674">bug 110674</a> from <a class="email" href="mailto:tom@r.je" title="Tom B <tom@r.je>"> Tom B</a> <pre>Reverting d1a3e239a6016f2bb42a91696056e223982e8538 didn't fix it for me. But that commit may give some insight because it is related to uclk which is the first error we get. I also tried globally increasing usec_timeout as it's used in a few places (patch below). This makes the PC take about a minute to boot up, so clearly the GPU is in an invalid state before these timeouts are hit and then each subsequent call to smum_send_msg_to_smc_with_parameter causes a delay because each call times out. Whatever happens, puts the card into a state that it can't recover from. The next step is to try to find where vega20_set_uclk_to_highest_dpm_level is called from and see what happens just before the call to this function. diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index f4ac632a87b2..9b878c74b17e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2418,7 +2418,7 @@ int amdgpu_device_init(struct amdgpu_device *adev, adev->pdev = pdev; adev->flags = flags; adev->asic_type = flags & AMD_ASIC_MASK; - adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT; + adev->usec_timeout = AMDGPU_MAX_USEC_TIMEOUT*10; if (amdgpu_emu_mode == 1) adev->usec_timeout *= 2; adev->gmc.gart_size = 512 * 1024 * 1024; diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c index a7e8340baf90..a6b2bc4277ef 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c @@ -84,7 +84,7 @@ int hwmgr_early_init(struct pp_hwmgr *hwmgr) if (!hwmgr) return -EINVAL; - hwmgr->usec_timeout = AMD_MAX_USEC_TIMEOUT; + hwmgr->usec_timeout = AMD_MAX_USEC_TIMEOUT*10; hwmgr->pp_table_version = PP_TABLE_V1; hwmgr->dpm_level = AMD_DPM_FORCED_LEVEL_AUTO; hwmgr->request_dpm_level = AMD_DPM_FORCED_LEVEL_AUTO;</pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>