[PATCH 3/3] drm/amdgpu: Implement bad_page_threshold = -2 case
Russell, Kent
Kent.Russell at amd.com
Thu Oct 21 13:57:32 UTC 2021
[AMD Official Use Only]
> -----Original Message-----
> From: Tuikov, Luben <Luben.Tuikov at amd.com>
> Sent: Wednesday, October 20, 2021 6:01 PM
> To: Kuehling, Felix <Felix.Kuehling at amd.com>; Russell, Kent <Kent.Russell at amd.com>;
> amd-gfx at lists.freedesktop.org
> Cc: Joshi, Mukul <Mukul.Joshi at amd.com>
> Subject: Re: [PATCH 3/3] drm/amdgpu: Implement bad_page_threshold = -2 case
>
> On 2021-10-20 17:54, Felix Kuehling wrote:
> > On 2021-10-20 12:35 p.m., Kent Russell wrote:
> >> If the bad_page_threshold kernel parameter is set to -2,
> >> continue to post the GPU. Print a warning to dmesg that this action has
> >> been done, and that page retirement will obviously not work for said GPU
> > I'd squash patch 2 and 3. The squashed patch is
> >
> > Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
>
> I was just thinking the same thing. Keep the title and text of patch 2 and add the description
> of 3 to 2. With that done:
>
> Reviewed-by: Luben Tuikov <luben.tuikov at amd.com>
Sounds good, thanks. I was on the fence about combining them from when I had the separate kernel param, and it was easier to squash it at review time than to separate it. I'll still need to work on patch #1 but thanks for the reviews here!
Kent
>
> Regards,
> Luben
>
> >
> >
> >> Cc: Luben Tuikov <luben.tuikov at amd.com>
> >> Cc: Mukul Joshi <Mukul.Joshi at amd.com>
> >> Signed-off-by: Kent Russell <kent.russell at amd.com>
> >> ---
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 13 +++++++++----
> >> 1 file changed, 9 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> >> index 1ede0f0d6f55..31852330c1db 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
> >> @@ -1115,11 +1115,16 @@ int amdgpu_ras_eeprom_init(struct
> amdgpu_ras_eeprom_control *control,
> >> res = amdgpu_ras_eeprom_correct_header_tag(control,
> >> RAS_TABLE_HDR_VAL);
> >> } else {
> >> - *exceed_err_limit = true;
> >> - dev_err(adev->dev,
> >> - "RAS records:%d exceed threshold:%d, "
> >> - "GPU will not be initialized. Replace this GPU or increase the
> threshold",
> >> + dev_err(adev->dev, "RAS records:%d exceed threshold:%d",
> >> control->ras_num_recs, ras->bad_page_cnt_threshold);
> >> + if (amdgpu_bad_page_threshold == -2) {
> >> + dev_warn(adev->dev, "GPU will be initialized due to
> bad_page_threshold = -2.");
> >> + dev_warn(adev->dev, "Page retirement will not work for
> this GPU in this state.");
> >> + res = 0;
> >> + } else {
> >> + *exceed_err_limit = true;
> >> + dev_err(adev->dev, "GPU will not be initialized. Replace this
> GPU or increase the threshold.");
> >> + }
> >> }
> >> } else {
> >> DRM_INFO("Creating a new EEPROM table");
More information about the amd-gfx
mailing list