[PATCH] drm/amdgpu: Register bad page handler for Aldebaran
Alex Deucher
alexdeucher at gmail.com
Thu May 13 15:02:02 UTC 2021
On Thu, May 13, 2021 at 10:57 AM Borislav Petkov <bp at alien8.de> wrote:
>
> On Thu, May 13, 2021 at 10:32:45AM -0400, Alex Deucher wrote:
> > Right. The sys admin can query the bad page count and decide when to
> > retire the card.
>
> Yap, although the driver should actively "tell" the sysadmin when some
> critical counts of retired VRAM pages are reached because I doubt all
> admins would go look at those counts on their own.
I think we print something in the log as well when we hit the
threshold. I need to double check the code.
>
> Btw, you say "admin" - am I to understand that those are some high end
> GPU cards with ECC memory? If consumer grade stuff has this too, then
> the driver should very much warn on such levels on its own because
> normal users won't know what and where to look.
>
Currently it's only available on workstation and datacenter boards.
> Other than that, the big picture sounds good to me.
Thanks!
Alex
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
More information about the amd-gfx
mailing list