[PATCH] drm/amdgpu: Register bad page handler for Aldebaran

Alex Deucher alexdeucher at gmail.com
Thu May 13 15:02:02 UTC 2021


On Thu, May 13, 2021 at 10:57 AM Borislav Petkov <bp at alien8.de> wrote:
>
> On Thu, May 13, 2021 at 10:32:45AM -0400, Alex Deucher wrote:
> > Right.  The sys admin can query the bad page count and decide when to
> > retire the card.
>
> Yap, although the driver should actively "tell" the sysadmin when some
> critical counts of retired VRAM pages are reached because I doubt all
> admins would go look at those counts on their own.

I think we print something in the log as well when we hit the
threshold.  I need to double check the code.

>
> Btw, you say "admin" - am I to understand that those are some high end
> GPU cards with ECC memory? If consumer grade stuff has this too, then
> the driver should very much warn on such levels on its own because
> normal users won't know what and where to look.
>

Currently it's only available on workstation and datacenter boards.

> Other than that, the big picture sounds good to me.

Thanks!

Alex

>
> Thx.
>
> --
> Regards/Gruss,
>     Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette


More information about the amd-gfx mailing list