[Nouveau] [PATCH 0/5] Improve Robust Channel (RC) recovery for Turing

Karol Herbst kherbst at redhat.com
Fri Oct 30 12:49:34 UTC 2020


On Fri, Oct 30, 2020 at 3:37 AM Alistair Popple <apopple at nvidia.com> wrote:
>
> This is an initial series of patches to improve channel recovery on Turing GPUs
> with the goal of improving reliability enough to eventually enable SVM for
> Turing. It's likely follow up patches will be required to fully address problems
> with less trivial workloads than what I have been able to test thus far.
>
> This series primarily addresses a number of hardware changes to interrupt layout
> and channel recovery for Turing and for simple cases improves handling and
> reliability of recovery.
>
> I have been testing trivial OpenCL workloads and with this series have been able
> to recover from while(1) style GPU loops and bad pointer dereferences on a
> Turing GPU. However if there are less trivial tests available that have been
> known to cause problems with channel recovery in the past let me know and I'll
> start testing those as well.
>

Thanks for working on this! I occasionally hit fatal errors when
working on OpenCL with the official CTS, but that's on Pascal. I could
give your patches a go once I move my main development machine over to
Turing and report if I still trigger problems nouveau isn't able to
recover from.

But yeah, generally the CTS is able to cause bigger issues for me at least.

> Alistair Popple (5):
>   drm/nouveau: Fix MMU fault interrupts on Turing
>   drm/nouveau: Remove Turing interrupt hack
>   drm/nouveau: Move Turing specific FIFO functions
>   drm/nouveau: FIFO interrupt fixes for Turing
>   drm/nouveau: Turing channel preemption fix
>
>  .../gpu/drm/nouveau/nvkm/engine/fifo/gk104.c  |  46 +--
>  .../gpu/drm/nouveau/nvkm/engine/fifo/gk104.h  |  32 ++
>  .../gpu/drm/nouveau/nvkm/engine/fifo/tu102.c  | 364 +++++++++++++++++-
>  .../gpu/drm/nouveau/nvkm/subdev/fault/tu102.c |  21 +-
>  drivers/gpu/drm/nouveau/nvkm/subdev/mc/base.c |   3 -
>  drivers/gpu/drm/nouveau/nvkm/subdev/mc/priv.h |   1 -
>  .../gpu/drm/nouveau/nvkm/subdev/mc/tu102.c    | 113 +++++-
>  7 files changed, 529 insertions(+), 51 deletions(-)
>
> --
> 2.20.1
>
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/nouveau
>



More information about the Nouveau mailing list