[PATCH] drm/amdgpu: refine usage of amdgpu_bad_page_threshold
Zhou1, Tao
Tao.Zhou1 at amd.com
Fri Jun 13 04:00:40 UTC 2025
[AMD Official Use Only - AMD Internal Distribution Only]
> -----Original Message-----
> From: Xie, Patrick <Gangliang.Xie at amd.com>
> Sent: Friday, June 13, 2025 11:07 AM
> To: amd-gfx at lists.freedesktop.org
> Cc: Zhang, Hawking <Hawking.Zhang at amd.com>; Zhou1, Tao
> <Tao.Zhou1 at amd.com>; Xie, Patrick <Gangliang.Xie at amd.com>
> Subject: [PATCH] drm/amdgpu: refine usage of amdgpu_bad_page_threshold
>
> when amdgpu_bad_page_threshold == -1 or -2, driver will issue a warning message
> when threshold is reached and continue runtime services.
>
> Signed-off-by: ganglxie <ganglxie at amd.com>
> ---
> .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 21 +++++++++----------
> 1 file changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> index 2ddedf476542..a9246c53bde9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> @@ -763,18 +763,17 @@ amdgpu_ras_eeprom_update_header(struct
> amdgpu_ras_eeprom_control *control)
> dev_warn(adev->dev,
> "Saved bad pages %d reaches threshold value %d\n",
> control->ras_num_bad_pages, ras-
> >bad_page_cnt_threshold);
> - control->tbl_hdr.header = RAS_TABLE_HDR_BAD;
> - if (control->tbl_hdr.version >= RAS_TABLE_VER_V2_1) {
> - control->tbl_rai.rma_status =
> GPU_RETIRED__ECC_REACH_THRESHOLD;
> - control->tbl_rai.health_percent = 0;
> - }
> -
> if ((amdgpu_bad_page_threshold != -1) &&
> - (amdgpu_bad_page_threshold != -2))
> + (amdgpu_bad_page_threshold != -2)) {
> + control->tbl_hdr.header = RAS_TABLE_HDR_BAD;
> + if (control->tbl_hdr.version >= RAS_TABLE_VER_V2_1) {
> + control->tbl_rai.rma_status =
> GPU_RETIRED__ECC_REACH_THRESHOLD;
> + control->tbl_rai.health_percent = 0;
> + }
> ras->is_rma = true;
> -
> - /* ignore the -ENOTSUPP return value */
> - amdgpu_dpm_send_rma_reason(adev);
> + /* ignore the -ENOTSUPP return value */
> + amdgpu_dpm_send_rma_reason(adev);
> + }
> }
>
> if (control->tbl_hdr.version >= RAS_TABLE_VER_V2_1) @@ -1509,7
> +1508,7 @@ int amdgpu_ras_eeprom_check(struct amdgpu_ras_eeprom_control
> *control)
> "RAS records:%d exceed threshold:%d\n",
> control->ras_num_bad_pages, ras-
> >bad_page_cnt_threshold);
> if ((amdgpu_bad_page_threshold == -1) ||
> - (amdgpu_bad_page_threshold == -2)) {
> + (amdgpu_bad_page_threshold == -2)) {
[Tao] the replacement is unnecessary, with this fixed, the patch is:
Reviewed-by: Tao Zhou <tao.zhou1 at amd.com>
> res = 0;
> dev_warn(adev->dev,
> "Please consult AMD Service Action Guide
> (SAG) for appropriate service procedures\n");
> --
> 2.34.1
More information about the amd-gfx
mailing list