[PATCH v2] drm/amdgpu: Fix error handling in amdgpu_ras_add_bad_pages
Zhou1, Tao
Tao.Zhou1 at amd.com
Thu Jan 2 09:32:18 UTC 2025
[AMD Official Use Only - AMD Internal Distribution Only]
> -----Original Message-----
> From: SHANMUGAM, SRINIVASAN <SRINIVASAN.SHANMUGAM at amd.com>
> Sent: Tuesday, December 17, 2024 5:39 PM
> To: Koenig, Christian <Christian.Koenig at amd.com>; Deucher, Alexander
> <Alexander.Deucher at amd.com>; Chai, Thomas <YiPeng.Chai at amd.com>
> Cc: amd-gfx at lists.freedesktop.org; SHANMUGAM, SRINIVASAN
> <SRINIVASAN.SHANMUGAM at amd.com>; Dan Carpenter
> <dan.carpenter at linaro.org>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Zhang, Hawking
> <Hawking.Zhang at amd.com>
> Subject: [PATCH v2] drm/amdgpu: Fix error handling in
> amdgpu_ras_add_bad_pages
>
> It ensures that appropriate error codes are returned when an error condition is
> detected
>
> Fixes the below;
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2849 amdgpu_ras_add_bad_pages()
> warn: missing error code here? 'amdgpu_umc_pages_in_a_row()' failed.
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2884 amdgpu_ras_add_bad_pages()
> warn: missing error code here? 'amdgpu_ras_mca2pa()' failed.
>
> Fixes: 9fe61c21405a ("drm/amdgpu: parse legacy RAS bad page mixed with new
> data in various NPS modes")
> Reported-by: Dan Carpenter <dan.carpenter at linaro.org>
> Cc: YiPeng Chai <yipeng.chai at amd.com>
> Cc: Tao Zhou <tao.zhou1 at amd.com>
> Cc: Hawking Zhang <Hawking.Zhang at amd.com>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Alex Deucher <alexander.deucher at amd.com>
> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam at amd.com>
> ---
> v2:
> - s/-EIO/-EINVAL, retained the use of -EINVAL from
> amdgpu_umc_pages_in_a_row & and amdgpu_ras_mca2pa_by_idx, when the
> RAS context is not initialized or the convert_ras_err_addr function is
> unavailable. (Thomas)
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 21 ++++++++++++++++-----
> 1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 01c947066a2e..f1371d1f8421 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2832,8 +2832,10 @@ int amdgpu_ras_add_bad_pages(struct amdgpu_device
> *adev,
>
> mutex_lock(&con->recovery_lock);
> data = con->eh_data;
> - if (!data)
> + if (!data) {
> + ret = -EINVAL;
[Tao] this case is acceptable and return 0 is enough, with this fixed, the patch is:
Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>
> goto free;
> + }
>
> for (i = 0; i < pages; i++) {
> if (from_rom &&
> @@ -2845,26 +2847,34 @@ int amdgpu_ras_add_bad_pages(struct
> amdgpu_device *adev,
> * one row
> */
> if
> (amdgpu_umc_pages_in_a_row(adev, &err_data,
> - bps[i].retired_page <<
> AMDGPU_GPU_PAGE_SHIFT))
> +
> bps[i].retired_page <<
> +
> AMDGPU_GPU_PAGE_SHIFT)) {
> + ret = -EINVAL;
> goto free;
> - else
> + } else {
> find_pages_per_pa = true;
> + }
> } else {
> /* unsupported cases */
> + ret = -EOPNOTSUPP;
> goto free;
> }
> }
> } else {
> if (amdgpu_umc_pages_in_a_row(adev, &err_data,
> - bps[i].retired_page <<
> AMDGPU_GPU_PAGE_SHIFT))
> + bps[i].retired_page <<
> AMDGPU_GPU_PAGE_SHIFT)) {
> + ret = -EINVAL;
> goto free;
> + }
> }
> } else {
> if (from_rom && !find_pages_per_pa) {
> if (bps[i].retired_page & UMC_CHANNEL_IDX_V2) {
> /* bad page in any NPS mode in eeprom */
> - if (amdgpu_ras_mca2pa_by_idx(adev, &bps[i],
> &err_data))
> + if (amdgpu_ras_mca2pa_by_idx(adev, &bps[i],
> &err_data)) {
> + ret = -EINVAL;
> goto free;
> + }
> } else {
> /* legacy bad page in eeprom, generated only
> in
> * NPS1 mode
> @@ -2881,6 +2891,7 @@ int amdgpu_ras_add_bad_pages(struct amdgpu_device
> *adev,
> /* non-nps1 mode, old RAS TA
> * can't support it
> */
> + ret = -EOPNOTSUPP;
> goto free;
> }
> }
> --
> 2.34.1
More information about the amd-gfx
mailing list