[PATCH v2] drm/amdgpu: resove reboot exception for si oland
Quan, Evan
Evan.Quan at amd.com
Wed Mar 15 02:32:35 UTC 2023
[AMD Official Use Only - General]
I'm OK with the drop of si_set_temperature_range() in late_init.
Meanwhile, it's still not clear to me how this could lead reboot exception.
Can you dig this a little bit further?
For example, can you check whether the operation(si_thermal_start_thermal_controller()) actually already failed in hw_init(si_dpm_enable more specifically)?
@@ -6918,7 +6918,11 @@ static int si_dpm_enable(struct amdgpu_device *adev)
si_start_dpm(adev);
si_enable_auto_throttle_source(adev, SI_DPM_AUTO_THROTTLE_SRC_THERMAL, true);
- si_thermal_start_thermal_controller(adev);
+ ret = si_thermal_start_thermal_controller(adev);
+ if (ret) {
+ DRM_ERROR("si_thermal_start_thermal_controller failed\n");
+ return ret;
+ }
ni_update_current_ps(adev, boot_ps);
BR
Evan
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> Zhenneng Li
> Sent: Monday, March 13, 2023 10:57 AM
> To: Chen, Guchun <Guchun.Chen at amd.com>
> Cc: David Airlie <airlied at linux.ie>; Pan, Xinhui <Xinhui.Pan at amd.com>;
> Zhenneng Li <lizhenneng at kylinos.cn>; amd-gfx at lists.freedesktop.org;
> Daniel Vetter <daniel at ffwll.ch>; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Koenig, Christian
> <Christian.Koenig at amd.com>
> Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
>
> During reboot test on arm64 platform, it may failure
> on boot.
>
> The error message are as follows:
> [ 6.996395][ 7] [ T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
> *ERROR*
> late_init of IP block <si_dpm> failed -22
> [ 7.006919][ 7] [ T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init
> failed
> [ 7.014224][ 7] [ T295] amdgpu 0000:04:00.0: Fatal error during GPU init
> ---
> drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 ------------
> 1 file changed, 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> index d6d9e3b1b2c0..ca9bce895dbe 100644
> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct
> amdgpu_device *adev,
>
> static int si_dpm_late_init(void *handle)
> {
> - int ret;
> - struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> -
> - if (!adev->pm.dpm_enabled)
> - return 0;
> -
> - ret = si_set_temperature_range(adev);
> - if (ret)
> - return ret;
> -#if 0 //TODO ?
> - si_dpm_powergate_uvd(adev, true);
> -#endif
> return 0;
> }
>
> --
> 2.25.1
More information about the amd-gfx
mailing list