[PATCH] drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count
Yang, Stanley
Stanley.Yang at amd.com
Tue Oct 31 11:00:33 UTC 2023
[AMD Official Use Only - General]
Reviewed-by: Stanley.Yang <Stanley.Yang at amd.com>
Regards,
Stanley
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Tao
> Zhou
> Sent: Tuesday, October 31, 2023 3:13 PM
> To: amd-gfx at lists.freedesktop.org
> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>; Zhang, Hawking
> <Hawking.Zhang at amd.com>
> Subject: [PATCH] drm/amdgpu: check recovery status of xgmi hive in
> ras_reset_error_count
>
> Handle xgmi hive case.
>
> Suggested-by: Hawking Zhang <Hawking.Zhang at amd.com>
> Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 753260745554..0093c28f4343 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -1226,6 +1226,8 @@ int amdgpu_ras_reset_error_count(struct
> amdgpu_device *adev,
> struct amdgpu_ras_block_object *block_obj =
> amdgpu_ras_get_ras_block(adev, block, 0);
> struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
> const struct amdgpu_mca_smu_funcs *mca_funcs = adev-
> >mca.mca_funcs;
> + struct amdgpu_hive_info *hive;
> + int hive_ras_recovery = 0;
>
> if (!block_obj || !block_obj->hw_ops) {
> dev_dbg_once(adev->dev, "%s doesn't config RAS
> function\n", @@ -1237,8 +1239,15 @@ int
> amdgpu_ras_reset_error_count(struct amdgpu_device *adev,
> !amdgpu_ras_get_mca_debug_mode(adev))
> return -EOPNOTSUPP;
>
> + hive = amdgpu_get_xgmi_hive(adev);
> + if (hive) {
> + hive_ras_recovery = atomic_read(&hive->ras_recovery);
> + amdgpu_put_xgmi_hive(hive);
> + }
> +
> /* skip ras error reset in gpu reset */
> - if ((amdgpu_in_reset(adev) || atomic_read(&ras->in_recovery)) &&
> + if ((amdgpu_in_reset(adev) || atomic_read(&ras->in_recovery) ||
> + hive_ras_recovery) &&
> mca_funcs && mca_funcs->mca_set_debug_mode)
> return -EOPNOTSUPP;
>
> --
> 2.35.1
More information about the amd-gfx
mailing list