[Nouveau] CUDA fixed VA allocations and sparse mappings

Wed Jul 8 14:18:02 PDT 2015

> > > Ah, I get what you're saying.  Sure, I think that might be okay.  Not sure
> > > how we would get at that information, though, and it would be horrible to
> > > just bake it in somewhere.  I'm looking into how nvgpu driver does it...
> > > maybe they have good reasons to do it the way they do.  Sorry if I go
> > > quiet for a little bit...
> > After a very quick look, it looks like the kernel defines a
> > PAGE_OFFSET macro which is the start of kernel virtual address space.
> 
> You need to be carefull here, first the hardware might not have as many bit
> as the CPU. For instance x86-64 have a 48bits for virtual address ie only
> 48bits of the address is meaning full, older radeon (<CI iirc) only have
> 40bits for the address bus. With such configuration you could not move all
> private kernel allocation inside the kernel zone.
> 
> Second issue is thing like 32bit process on 64bit kernel, in which case
> you have the usual 3GB userspace, 1GB kernel space split. So instead of
> using PAGE_OFFSET you might want to use TASK_SIZE which is a macro that
> will lookup the limit using the current (process struct pointer).
> 
> I think issue for nouveau is that kernel space already handle some
> allocation of virtual address, while for radeon the whole virtual address
> space is fully under the userspace control.
> 
> Given this, you might want to use trick on both side (kernel and user
> space). For instance you could mmap a region with PROT_NONE to reserve
> a range of virtual address from userspace, then tell the driver about
> that range and have the driver initialize the GPU and use that chunk
> for kernel private structure allocation.
> 
> Issue is that it is kind of a API violation for nouveau kernel driver.
> Thought i am not familiar enough, maybe you can do ioctl to nouveau
> before nouveau inialize and allocate the kernel private buffer (gr and
> other stuff). If so then problem solve i guess. Process that want to
> use CUDA will need to do the mmap dance and early ioctl.

I think we can have a nouveau ioctl to report the full address range that
the GPU supports.  Userspace can use this information to know what range
it can reserve.  The reservation part we can do with the original AS_ALLOC
and AS_FREE nouveau ioctls that I originally proposed, and in the CUDA case,
this reservation should happen before any channel for a particular context
gets created.