[PATCH 2/2] update check condition of query for ras page retire

Zhou1, Tao Tao.Zhou1 at amd.com
Thu Jan 18 01:35:46 UTC 2024


[AMD Official Use Only - General]

Sure, will revert related patch in the next version.

Regards,
Tao

> -----Original Message-----
> From: Zhang, Hawking <Hawking.Zhang at amd.com>
> Sent: Wednesday, January 17, 2024 8:09 PM
> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org
> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Subject: RE: [PATCH 2/2] update check condition of query for ras page retire
>
> [AMD Official Use Only - General]
>
> static ssize_t smu_v13_0_6_get_ecc_info(struct smu_context *smu,
>                         void *table)
>  {
> -       /* Support ecc info by default */
> -       return 0;
> +       /* we use debug mode flag instead of this interface */
> +       return -EOPNOTSUPP;
>  }
>
> Shall we just drop the callback implementation? smu_get_ecc_info will return -
> EOPNOTSUPP if the callback is not supported.
>
> Regards,
> Hawking
>
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Tao Zhou
> Sent: Wednesday, January 17, 2024 17:15
> To: amd-gfx at lists.freedesktop.org
> Cc: Zhou1, Tao <Tao.Zhou1 at amd.com>
> Subject: [PATCH 2/2] update check condition of query for ras page retire
>
> Support page retirement handling in debug mode.
>
> Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c              | 9 +++++++--
>  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 4 ++--
>  2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> index 41139bac7643..6df32f0afd89 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> @@ -90,12 +90,16 @@ static void amdgpu_umc_handle_bad_pages(struct
> amdgpu_device *adev,  {
>         struct ras_err_data *err_data = (struct ras_err_data *)ras_error_status;
>         struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
> +       unsigned int error_query_mode;
>         int ret = 0;
>
> +       amdgpu_ras_get_error_query_mode(adev, &error_query_mode);
> +
>         mutex_lock(&con->page_retirement_lock);
>
>         ret = amdgpu_dpm_get_ecc_info(adev, (void *)&(con->umc_ecc));
> -       if (ret == -EOPNOTSUPP) {
> +       if (ret == -EOPNOTSUPP &&
> +           error_query_mode == AMDGPU_RAS_DIRECT_ERROR_QUERY) {
>                 if (adev->umc.ras && adev->umc.ras->ras_block.hw_ops &&
>                     adev->umc.ras->ras_block.hw_ops->query_ras_error_count)
>                     adev->umc.ras->ras_block.hw_ops->query_ras_error_count(adev,
> ras_error_status); @@ -119,7 +123,8 @@ static void
> amdgpu_umc_handle_bad_pages(struct amdgpu_device *adev,
>                          */
>                         adev->umc.ras->ras_block.hw_ops-
> >query_ras_error_address(adev, ras_error_status);
>                 }
> -       } else if (!ret) {
> +       } else if (error_query_mode == AMDGPU_RAS_FIRMWARE_ERROR_QUERY
> ||
> +           (!ret && error_query_mode == AMDGPU_RAS_DIRECT_ERROR_QUERY)) {
>                 if (adev->umc.ras &&
>                     adev->umc.ras->ecc_info_query_ras_error_count)
>                     adev->umc.ras->ecc_info_query_ras_error_count(adev,
> ras_error_status); diff --git
> a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
> index c560f4af214d..d86c9e7fc64b 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c
> @@ -2909,8 +2909,8 @@ static int
> smu_v13_0_6_select_xgmi_plpd_policy(struct smu_context *smu,  static ssize_t
> smu_v13_0_6_get_ecc_info(struct smu_context *smu,
>                         void *table)
>  {
> -       /* Support ecc info by default */
> -       return 0;
> +       /* we use debug mode flag instead of this interface */
> +       return -EOPNOTSUPP;
>  }
>
>  static const struct pptable_funcs smu_v13_0_6_ppt_funcs = {
> --
> 2.35.1
>



More information about the amd-gfx mailing list