[Nouveau] [PATCH 0/5] Improve Robust Channel (RC) recovery for Turing

Alistair Popple apopple at nvidia.com
Fri Oct 30 02:36:40 UTC 2020


This is an initial series of patches to improve channel recovery on Turing GPUs
with the goal of improving reliability enough to eventually enable SVM for
Turing. It's likely follow up patches will be required to fully address problems
with less trivial workloads than what I have been able to test thus far.

This series primarily addresses a number of hardware changes to interrupt layout
and channel recovery for Turing and for simple cases improves handling and
reliability of recovery.

I have been testing trivial OpenCL workloads and with this series have been able
to recover from while(1) style GPU loops and bad pointer dereferences on a
Turing GPU. However if there are less trivial tests available that have been
known to cause problems with channel recovery in the past let me know and I'll
start testing those as well.

Alistair Popple (5):
  drm/nouveau: Fix MMU fault interrupts on Turing
  drm/nouveau: Remove Turing interrupt hack
  drm/nouveau: Move Turing specific FIFO functions
  drm/nouveau: FIFO interrupt fixes for Turing
  drm/nouveau: Turing channel preemption fix

 .../gpu/drm/nouveau/nvkm/engine/fifo/gk104.c  |  46 +--
 .../gpu/drm/nouveau/nvkm/engine/fifo/gk104.h  |  32 ++
 .../gpu/drm/nouveau/nvkm/engine/fifo/tu102.c  | 364 +++++++++++++++++-
 .../gpu/drm/nouveau/nvkm/subdev/fault/tu102.c |  21 +-
 drivers/gpu/drm/nouveau/nvkm/subdev/mc/base.c |   3 -
 drivers/gpu/drm/nouveau/nvkm/subdev/mc/priv.h |   1 -
 .../gpu/drm/nouveau/nvkm/subdev/mc/tu102.c    | 113 +++++-
 7 files changed, 529 insertions(+), 51 deletions(-)

-- 
2.20.1



More information about the Nouveau mailing list