[Nouveau] [PATCH 0/5] Improve Robust Channel (RC) recovery for Turing
Karol Herbst
kherbst at redhat.com
Fri Oct 30 12:49:34 UTC 2020
On Fri, Oct 30, 2020 at 3:37 AM Alistair Popple <apopple at nvidia.com> wrote:
>
> This is an initial series of patches to improve channel recovery on Turing GPUs
> with the goal of improving reliability enough to eventually enable SVM for
> Turing. It's likely follow up patches will be required to fully address problems
> with less trivial workloads than what I have been able to test thus far.
>
> This series primarily addresses a number of hardware changes to interrupt layout
> and channel recovery for Turing and for simple cases improves handling and
> reliability of recovery.
>
> I have been testing trivial OpenCL workloads and with this series have been able
> to recover from while(1) style GPU loops and bad pointer dereferences on a
> Turing GPU. However if there are less trivial tests available that have been
> known to cause problems with channel recovery in the past let me know and I'll
> start testing those as well.
>
Thanks for working on this! I occasionally hit fatal errors when
working on OpenCL with the official CTS, but that's on Pascal. I could
give your patches a go once I move my main development machine over to
Turing and report if I still trigger problems nouveau isn't able to
recover from.
But yeah, generally the CTS is able to cause bigger issues for me at least.
> Alistair Popple (5):
> drm/nouveau: Fix MMU fault interrupts on Turing
> drm/nouveau: Remove Turing interrupt hack
> drm/nouveau: Move Turing specific FIFO functions
> drm/nouveau: FIFO interrupt fixes for Turing
> drm/nouveau: Turing channel preemption fix
>
> .../gpu/drm/nouveau/nvkm/engine/fifo/gk104.c | 46 +--
> .../gpu/drm/nouveau/nvkm/engine/fifo/gk104.h | 32 ++
> .../gpu/drm/nouveau/nvkm/engine/fifo/tu102.c | 364 +++++++++++++++++-
> .../gpu/drm/nouveau/nvkm/subdev/fault/tu102.c | 21 +-
> drivers/gpu/drm/nouveau/nvkm/subdev/mc/base.c | 3 -
> drivers/gpu/drm/nouveau/nvkm/subdev/mc/priv.h | 1 -
> .../gpu/drm/nouveau/nvkm/subdev/mc/tu102.c | 113 +++++-
> 7 files changed, 529 insertions(+), 51 deletions(-)
>
> --
> 2.20.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>
More information about the Nouveau
mailing list