[PATCH 4/4] drm/amdgpu: Move amdgpu_ras_recovery_init to after SMU ready.

Mon Oct 21 13:24:15 UTC 2019

> -----Original Message-----
> From: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> Sent: Friday, October 18, 2019 4:49 PM
> To: amd-gfx at lists.freedesktop.org
> Cc: Chen, Guchun <Guchun.Chen at amd.com>; Zhou1, Tao
> <Tao.Zhou1 at amd.com>; Deucher, Alexander
> <Alexander.Deucher at amd.com>; noreply-confluence at amd.com; Quan,
> Evan <Evan.Quan at amd.com>; Grodzovsky, Andrey
> <Andrey.Grodzovsky at amd.com>
> Subject: [PATCH 4/4] drm/amdgpu: Move amdgpu_ras_recovery_init to
> after SMU ready.
> 
> For Arcturus the I2C traffic is done through SMU tables and so we must
> postpone RAS recovery init to after they are ready which is in
> amdgpu_device_ip_hw_init_phase2.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>

Reviewed-by: Alex Deucher <alexander.deucher at amd.com>

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 +++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 11 -----------
>  2 files changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 17cfdaf..c40e9a5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1850,6 +1850,19 @@ static int amdgpu_device_ip_init(struct
> amdgpu_device *adev)
>  	if (r)
>  		goto init_failed;
> 
> +	/*
> +	 * retired pages will be loaded from eeprom and reserved here,
> +	 * it should be called after amdgpu_device_ip_hw_init_phase2  since
> +	 * for some ASICs the RAS EEPROM code relies on SMU fully
> functioning
> +	 * for I2C communication which only true at this point.
> +	 * recovery_init may fail, but it can free all resources allocated by
> +	 * itself and its failure should not stop amdgpu init process.
> +	 *
> +	 * Note: theoretically, this should be called before all vram allocations
> +	 * to protect retired page from abusing
> +	 */
> +	amdgpu_ras_recovery_init(adev);
> +
>  	if (adev->gmc.xgmi.num_physical_nodes > 1)
>  		amdgpu_xgmi_add_device(adev);
>  	amdgpu_amdkfd_device_init(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 2e85a51..1045c3f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1721,17 +1721,6 @@ int amdgpu_ttm_init(struct amdgpu_device
> *adev)  #endif
> 
>  	/*
> -	 * retired pages will be loaded from eeprom and reserved here,
> -	 * it should be called after ttm init since new bo may be created,
> -	 * recovery_init may fail, but it can free all resources allocated by
> -	 * itself and its failure should not stop amdgpu init process.
> -	 *
> -	 * Note: theoretically, this should be called before all vram allocations
> -	 * to protect retired page from abusing
> -	 */
> -	amdgpu_ras_recovery_init(adev);
> -
> -	/*
>  	 *The reserved vram for firmware must be pinned to the specified
>  	 *place on the VRAM, so reserve it early.
>  	 */
> --
> 2.7.4