[PATCH] drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count

Yang, Stanley Stanley.Yang at amd.com
Tue Oct 31 11:00:33 UTC 2023


[AMD Official Use Only - General]

Reviewed-by: Stanley.Yang <Stanley.Yang at amd.com>

Regards,
Stanley
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Tao
> Zhou
> Sent: Tuesday, October 31, 2023 3:13 PM
> To: amd-gfx at lists.freedesktop.org
> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>; Zhang, Hawking
> <Hawking.Zhang at amd.com>
> Subject: [PATCH] drm/amdgpu: check recovery status of xgmi hive in
> ras_reset_error_count
>
> Handle xgmi hive case.
>
> Suggested-by: Hawking Zhang <Hawking.Zhang at amd.com>
> Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 753260745554..0093c28f4343 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1226,6 +1226,8 @@ int amdgpu_ras_reset_error_count(struct
> amdgpu_device *adev,
>       struct amdgpu_ras_block_object *block_obj =
> amdgpu_ras_get_ras_block(adev, block, 0);
>       struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>       const struct amdgpu_mca_smu_funcs *mca_funcs = adev-
> >mca.mca_funcs;
> +     struct amdgpu_hive_info *hive;
> +     int hive_ras_recovery = 0;
>
>       if (!block_obj || !block_obj->hw_ops) {
>               dev_dbg_once(adev->dev, "%s doesn't config RAS
> function\n", @@ -1237,8 +1239,15 @@ int
> amdgpu_ras_reset_error_count(struct amdgpu_device *adev,
>           !amdgpu_ras_get_mca_debug_mode(adev))
>               return -EOPNOTSUPP;
>
> +     hive = amdgpu_get_xgmi_hive(adev);
> +     if (hive) {
> +             hive_ras_recovery = atomic_read(&hive->ras_recovery);
> +             amdgpu_put_xgmi_hive(hive);
> +     }
> +
>       /* skip ras error reset in gpu reset */
> -     if ((amdgpu_in_reset(adev) || atomic_read(&ras->in_recovery)) &&
> +     if ((amdgpu_in_reset(adev) || atomic_read(&ras->in_recovery) ||
> +         hive_ras_recovery) &&
>           mca_funcs && mca_funcs->mca_set_debug_mode)
>               return -EOPNOTSUPP;
>
> --
> 2.35.1



More information about the amd-gfx mailing list