[PATCH] drm/amdgpu: Register bad page handler for Aldebaran
Alex Deucher
alexdeucher at gmail.com
Thu May 13 14:32:45 UTC 2021
On Thu, May 13, 2021 at 10:30 AM Borislav Petkov <bp at alien8.de> wrote:
>
> On Thu, May 13, 2021 at 10:17:47AM -0400, Alex Deucher wrote:
> > The bad pages are stored in an EEPROM on the board and the next time
> > the driver loads it reads the EEPROM so that it can reserve the bad
> > pages at init time so they don't get used again.
>
> And that works automagically on the next boot? Because that sounds like
> the right thing to do.
Yes, or driver reload, suspend/resume, etc.
>
> So practically, what happens to a GPU in such a case where the VRAM
> starts going bad? It might get exhausted eventually and the driver will
> say something along the lines of:
>
> "VRAM bad pages: 80%, consider replacing the GPU. It is operating
> currently with degrated performance."
>
> or so?
Right. The sys admin can query the bad page count and decide when to
retire the card.
>
> Yap, from a RAS perspective, that makes good sense as you're prolonging
> the life of the component while still remains operational as good as it
> can and the only user interaction you need is she/he replacing it.
>
> Sounds good.
Yes. That's the idea.
Alex
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
More information about the amd-gfx
mailing list