[Nouveau] CUDA fixed VA allocations and sparse mappings

Tue Jul 7 16:53:13 PDT 2015

regarding
--------
Fixed address allocations weren't going to be part of that, but I see
that it makes sense for a variety of use cases.  One question I have
here is how this is intended to work where the RM needs to make some
of these allocations itself (for graphics context mapping, etc), how
should potential conflicts with user mappings be handled?
--------
As an initial implemetation you can probably assume that the GPU
offloading is in "exclusive" mode. Basically that the CUDA or OpenACC
code has full ownership of the card. The Tesla cards don't even have a
video out on them. To complicate this even more - some offloading code
has very long running kernels and even worse - may critically depend
on using the full available GPU ram. (Large matrix sizes and soon big
Fortran arrays or complex data types)

Long term - direct PCIe copies between cards will be important.. aka
zero-copy. It may seem crazy, but when you have 16+ GPU in a single
workstation (Cirrascale) stuff like this is key.