[PATCH] drm/amdgpu: report bad status in GPU recovery
Zhou1, Tao
Tao.Zhou1 at amd.com
Thu Aug 1 03:47:40 UTC 2024
[AMD Official Use Only - AMD Internal Distribution Only]
> -----Original Message-----
> From: Lazar, Lijo <Lijo.Lazar at amd.com>
> Sent: Wednesday, July 31, 2024 9:31 PM
> To: Zhou1, Tao <Tao.Zhou1 at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: report bad status in GPU recovery
>
>
>
> On 7/31/2024 3:35 PM, Tao Zhou wrote:
> > Instead of printing GPU reset failed.
> >
> > Signed-off-by: Tao Zhou <tao.zhou1 at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 355c2478c4b6..b7c967779b4b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -5933,8 +5933,13 @@ int amdgpu_device_gpu_recover(struct
> amdgpu_device *adev,
> > tmp_adev->asic_reset_res = 0;
> >
> > if (r) {
> > - /* bad news, how to tell it to userspace ? */
> > - dev_info(tmp_adev->dev, "GPU reset(%d) failed\n",
> atomic_read(&tmp_adev->gpu_reset_counter));
> > + /* bad news, how to tell it to userspace ?
> > + * for ras error, we should report GPU bad status instead
> of
> > + * reset failure
> > + */
> > + if
> (!amdgpu_ras_eeprom_check_err_threshold(tmp_adev))
> > + dev_info(tmp_adev->dev, "GPU reset(%d)
> failed\n",
> > + atomic_read(&tmp_adev-
> >gpu_reset_counter));
>
> Better to check reset_context.src == AMDGPU_RESET_SRC_RAS to confirm that
> the reset is indeed triggered due to ras error.
[Tao] It seems AMDGPU_RESET_SRC_RAS is not used currently, I will set it before use the flag.
>
> Thanks,
> Lijo
>
> > amdgpu_vf_error_put(tmp_adev,
> AMDGIM_ERROR_VF_GPU_RESET_FAIL, 0, r);
> > } else {
> > dev_info(tmp_adev->dev, "GPU reset(%d)
> succeeded!\n",
> > atomic_read(&tmp_adev->gpu_reset_counter));
More information about the amd-gfx
mailing list