[PATCH v2] drm/amdgpu: resove reboot exception for si oland

Quan, Evan Evan.Quan at amd.com
Wed Mar 15 02:32:35 UTC 2023


[AMD Official Use Only - General]

I'm OK with the drop of si_set_temperature_range() in late_init.
Meanwhile, it's still not clear to me how this could lead reboot exception.
Can you dig this a little bit further?
For example, can you check whether the operation(si_thermal_start_thermal_controller()) actually already failed in hw_init(si_dpm_enable more specifically)?

@@ -6918,7 +6918,11 @@ static int si_dpm_enable(struct amdgpu_device *adev)
        si_start_dpm(adev);

        si_enable_auto_throttle_source(adev, SI_DPM_AUTO_THROTTLE_SRC_THERMAL, true);
-       si_thermal_start_thermal_controller(adev);
+       ret = si_thermal_start_thermal_controller(adev);
+       if (ret) {
+               DRM_ERROR("si_thermal_start_thermal_controller failed\n");
+               return ret;
+       }

        ni_update_current_ps(adev, boot_ps);

BR
Evan
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of
> Zhenneng Li
> Sent: Monday, March 13, 2023 10:57 AM
> To: Chen, Guchun <Guchun.Chen at amd.com>
> Cc: David Airlie <airlied at linux.ie>; Pan, Xinhui <Xinhui.Pan at amd.com>;
> Zhenneng Li <lizhenneng at kylinos.cn>; amd-gfx at lists.freedesktop.org;
> Daniel Vetter <daniel at ffwll.ch>; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Koenig, Christian
> <Christian.Koenig at amd.com>
> Subject: [PATCH v2] drm/amdgpu: resove reboot exception for si oland
> 
> During reboot test on arm64 platform, it may failure
> on boot.
> 
> The error message are as follows:
> [    6.996395][ 7] [  T295] [drm:amdgpu_device_ip_late_init [amdgpu]]
> *ERROR*
> 			    late_init of IP block <si_dpm> failed -22
> [    7.006919][ 7] [  T295] amdgpu 0000:04:00.0: amdgpu_device_ip_late_init
> failed
> [    7.014224][ 7] [  T295] amdgpu 0000:04:00.0: Fatal error during GPU init
> ---
>  drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> index d6d9e3b1b2c0..ca9bce895dbe 100644
> --- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
> @@ -7626,18 +7626,6 @@ static int si_dpm_process_interrupt(struct
> amdgpu_device *adev,
> 
>  static int si_dpm_late_init(void *handle)
>  {
> -	int ret;
> -	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> -
> -	if (!adev->pm.dpm_enabled)
> -		return 0;
> -
> -	ret = si_set_temperature_range(adev);
> -	if (ret)
> -		return ret;
> -#if 0 //TODO ?
> -	si_dpm_powergate_uvd(adev, true);
> -#endif
>  	return 0;
>  }
> 
> --
> 2.25.1


More information about the amd-gfx mailing list