[Nouveau] CUDA fixed VA allocations and sparse mappings

Tue Jul 7 14:09:15 PDT 2015

On 7 July 2015 at 10:42, Andrew Chew <achew at nvidia.com> wrote:
> Hello,
>
> I am currently looking into ways to support fixed virtual address allocations
> and sparse mappings in nouveau, as a step towards supporting CUDA.
Hey Andrew,

The sparse mappings was something I'd actually planned on doing too in
the near future, though I haven't yet settled on exactly how it'd be
exposed.

Fixed address allocations weren't going to be part of that, but I see
that it makes sense for a variety of use cases.  One question I have
here is how this is intended to work where the RM needs to make some
of these allocations itself (for graphics context mapping, etc), how
should potential conflicts with user mappings be handled?

Thanks,
Ben.

>
> CUDA requires that the GPU virtual address for a given buffer match the
> CPU virtual address.  Therefore, when mapping a CUDA buffer, we have to have
> a way of specifying a particular virtual address to map to (we would ask that
> the CPU virtual address be used).  Currently, as I understand it, the allocator
> implemented in nvkm/core/mm.c, used to provision virtual addresses, doesn't
> allow for this (but it's very easy to modify the allocator slightly to allow
> for this, which I have done locally in my experiments).
>
> In addition, the CUDA use case typically involves allocating a big chunk of
> address space ahead of time as a way to reserve that chunk for future CUDA
> use.  It then maps individual buffers into that address space as needed.
> Currently, the virtual address allocation is done during buffer mapping, so
> in order to support these sparse mappings, it seems to me that the virtual
> address allocation and buffer mapping need to be decoupled into separate
> operations.
>
> My current strawman proposal for supporting this is to introduce two new ioctls
> DRM_IOCTL_NOUVEAU_AS_ALLOC and DRM_IOCTL_NOUVEAU_AS_FREE, that look roughly
> like this:
>
> #define NOUVEAU_AS_ALLOC_FLAGS_FIXED_OFFSET 0x1
> struct drm_nouveau_as_alloc {
>         uint64_t pages;     /* in, pages */
>         uint32_t page_size; /* in, bytes */
>         uint32_t flags;     /* in */
>         uint64_t offset;    /* in/out, byte address */
> };
>
> struct drm_nouveau_as_free {
>         uint64_t offset;    /* in, byte address */
> };
>
> These ioctls just call into the allocator to allocate a range of addresses,
> resulting in a struct nvkm_vma that tracks that allocation (or releases the
> struct nvkm_vma back into the virtual address pool in the case of the free
> ioctl).  If NOUVEAU_AS_ALLOC_FLAGS_FIXED_OFFSET is set, offset specifies the
> requested virtual address.  Otherwise, an arbitrary address will be
> allocated.
>
> In addition to this, a way to map/unmap buffers is needed.  Ordinarily, one
> would just use DRM_IOCTL_PRIME_FD_TO_HANDLE to import and map a dmabuf into
> gem.  However, this ioctl will try to grab the virtual address range for this
> buffer, which will fail in the CUDA case since the virtual address range
> has been reserved ahead of time.  So we perhaps introduce a set of ioctls
> to map/unmap buffers on top of an already existing virtual address allocation.
>
> Please, feedback and questions are very much appreciated.
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau