[Nouveau] [PATCH 0/3] New instmem implementation for Tegra

Wed Sep 2 23:42:43 PDT 2015

Due to the lack of implicit synchronization between CPU and GPU on Tegra systems
(like what PCI provides for dGPUs), the instmem implementation of GK20A/GM20B
relied on the slow, legacy PRAMIN so that CPU accesses used the same path as
GPU, making sure we see the same data on both ends.

The recent Nouveau refactoring introduced acquire/release functions on instmem
that help us understand the bounds of accesses to instobjs. We can now use these
to perform any needed synchronization, and use direct CPU accesses. For Tegra,
provided that we use a write-combined CPU mapping, we just need to make sure the
GPU L2 cache is flushed before we access instobjs and invalidated after.

This patch series first introduces flush/invalidate hooks on LTC so we can
manage the L2 ourselves, and an implementation for GF100+ GPUs. The last patch
then changes GK20A's instmem implementation to use CPU mappings and manage the
GPU L2 in the new acquire/release hooks.

This results in a lot of L2 operations, but thankfully flushing/invalidating an
empty L2 returns immediately, so the performance gain is largely positive. In
the future it would be nice to move L2 management to a higher level though.

Alexandre Courbot (3):
  ltc: add hooks for invalidate and flush
  ltc/gf100: add flush/invalidate functions
  instmem/gk20a: use direct CPU writes

 drm/nouveau/include/nvkm/subdev/ltc.h   |   4 +
 drm/nouveau/nvkm/subdev/instmem/gk20a.c | 360 +++++++++++++++++++++++---------
 drm/nouveau/nvkm/subdev/ltc/base.c      |  14 ++
 drm/nouveau/nvkm/subdev/ltc/gf100.c     |  36 ++++
 drm/nouveau/nvkm/subdev/ltc/gk104.c     |   2 +
 drm/nouveau/nvkm/subdev/ltc/gm107.c     |   2 +
 drm/nouveau/nvkm/subdev/ltc/priv.h      |   5 +
 7 files changed, 326 insertions(+), 97 deletions(-)

-- 
2.5.1