[Nouveau] CUDA fixed VA allocations and sparse mappings

Tue Jul 7 11:47:10 PDT 2015

On Tue, Jul 07, 2015 at 11:29:38AM -0400, Ilia Mirkin wrote:
> On Mon, Jul 6, 2015 at 8:42 PM, Andrew Chew <achew at nvidia.com> wrote:
> > These ioctls just call into the allocator to allocate a range of addresses,
> > resulting in a struct nvkm_vma that tracks that allocation (or releases the
> > struct nvkm_vma back into the virtual address pool in the case of the free
> > ioctl).  If NOUVEAU_AS_ALLOC_FLAGS_FIXED_OFFSET is set, offset specifies the
> > requested virtual address.  Otherwise, an arbitrary address will be
> > allocated.
> 
> Well, this can't just be an address space. You still need bo's, if
> this is to work with nouveau -- it has to know when to swap things in
> and out, when they're used, etc. (and/or move between VRAM and GART
> and system/swap). I suspect that your target here are the GK20A and
> GM20B chips which don't have dedicated VRAM, but the ioctl's need to
> work for everything.
> 
> Would it be sufficient to extend NOUVEAU_GEM_NEW or create a
> NOUVEAU_GEM_NEW_FIXED or something? IOW, why do have to separate the
> concept of a GEM object and a VM allocation?

You're correct.  This is for gk20a and gm20b.

The thing these proposed ioctls are supposed to accomplish is to reserve,
ahead of time, a portion of the address space.  So at this time, there
really aren't any buffer objects yet, and there's nothing to be mapped to
the GMMU.  That part would come later.

> > In addition to this, a way to map/unmap buffers is needed.  Ordinarily, one
> > would just use DRM_IOCTL_PRIME_FD_TO_HANDLE to import and map a dmabuf into
> > gem.  However, this ioctl will try to grab the virtual address range for this
> > buffer, which will fail in the CUDA case since the virtual address range
> > has been reserved ahead of time.  So we perhaps introduce a set of ioctls
> > to map/unmap buffers on top of an already existing virtual address allocation.
> 
> My suggestion above is an alternative to this, right? I think dmabufs
> tend to be used for sharing between devices. I suspect there's more
> going on here that I don't understand though -- I assume the CUDA
> use-case is similar to the HSA use-case -- being able to build up data
> structures that point to one another on the CPU and then process them
> on the GPU? Can you detail a specific use-case perhaps, including the
> interactions with the GPU and its address space?

The whole dmabufs thing is kind of a side issue.  I'll take a look at
NOUVEAU_GEM_NEW, but that could be an alternative to this, maybe, if
extended (or we make a new NOUVEAU_GEM_NEW_FIXED, as you suggested).
Crucially, the NOUVEAU_GEM_NEW_FIXED operation shouldn't result in trying
to get a virtual address region and then failing because a previous
operation (see above) has reserved it already.

The use case is exactly as you describe.  There are data structures built
up that contain CPU pointers, and those pointers need to make sense to
the GPU as well.