[Nouveau] CUDA fixed VA allocations and sparse mappings

Wed Jul 8 12:40:05 PDT 2015

On Wed, Jul 08, 2015 at 10:51:55AM +1000, Ben Skeggs wrote:
> On 8 July 2015 at 10:47, Andrew Chew <achew at nvidia.com> wrote:
> > On Wed, Jul 08, 2015 at 10:37:34AM +1000, Ben Skeggs wrote:
> >> On 8 July 2015 at 10:31, Andrew Chew <achew at nvidia.com> wrote:
> >> > On Wed, Jul 08, 2015 at 10:18:36AM +1000, Ben Skeggs wrote:
> >> >> > There's some minimal state that needs to be mapped into GPU address space.
> >> >> > One thing that comes to mind are pushbuffers, which are needed to submit
> >> >> > stuff to any engine.
> >> >> I guess you can probably use the start of the kernel's address space
> >> >> carveout for these kind of mappings actually?  It's not like userspace
> >> >> can ever have virtual addresses there?
> >> >
> >> > Yeah.  I'm looking into it further, but to answer your original question,
> >> > I believe there is essentially an address range that nouveau would know
> >> > about, which it uses for fixed address allocations (I'm referring to how
> >> > the nvgpu driver does things...we may or may not come up with something
> >> > different for nouveau).
> >> >
> >> > Although it's dangerous, AFAIK the allocator in nouveau starts allocating
> >> > addresses at page 1, and as you suggested, one wouldn't ever get a CPU
> >> > address that low.  But having a set of addresses reserved would be much
> >> > better of course.
> >> I'm thinking more about the top of the address space.  As I understand
> >> it, the kernel already splits the CPU virtual address space into
> >> user/system areas (3GiB/1GiB for 32-bit IIUC), or something very
> >> similar to that.
> >>
> >> Perhaps, if we can get at that information, we can use those same
> >> definitions for GPU address space?
> >
> > Ah, I get what you're saying.  Sure, I think that might be okay.  Not sure
> > how we would get at that information, though, and it would be horrible to
> > just bake it in somewhere.  I'm looking into how nvgpu driver does it...
> > maybe they have good reasons to do it the way they do.  Sorry if I go
> > quiet for a little bit...
> After a very quick look, it looks like the kernel defines a
> PAGE_OFFSET macro which is the start of kernel virtual address space.

You need to be carefull here, first the hardware might not have as many bit
as the CPU. For instance x86-64 have a 48bits for virtual address ie only
48bits of the address is meaning full, older radeon (<CI iirc) only have
40bits for the address bus. With such configuration you could not move all
private kernel allocation inside the kernel zone.

Second issue is thing like 32bit process on 64bit kernel, in which case
you have the usual 3GB userspace, 1GB kernel space split. So instead of
using PAGE_OFFSET you might want to use TASK_SIZE which is a macro that
will lookup the limit using the current (process struct pointer).

I think issue for nouveau is that kernel space already handle some
allocation of virtual address, while for radeon the whole virtual address
space is fully under the userspace control.

Given this, you might want to use trick on both side (kernel and user
space). For instance you could mmap a region with PROT_NONE to reserve
a range of virtual address from userspace, then tell the driver about
that range and have the driver initialize the GPU and use that chunk
for kernel private structure allocation.

Issue is that it is kind of a API violation for nouveau kernel driver.
Thought i am not familiar enough, maybe you can do ioctl to nouveau
before nouveau inialize and allocate the kernel private buffer (gr and
other stuff). If so then problem solve i guess. Process that want to
use CUDA will need to do the mmap dance and early ioctl.

Hope this helps, cheers
Jérôme