[PATCH drm-next v2 00/16] [RFC] DRM GPUVA Manager & Nouveau VM_BIND UAPI

Thu Mar 9 09:48:41 UTC 2023

On Thu, 9 Mar 2023 10:12:43 +0100
Boris Brezillon <boris.brezillon at collabora.com> wrote:

> Hi Danilo,
> 
> On Fri, 17 Feb 2023 14:44:06 +0100
> Danilo Krummrich <dakr at redhat.com> wrote:
> 
> > Changes in V2:
> > ==============
> >   Nouveau:
> >     - Reworked the Nouveau VM_BIND UAPI to avoid memory allocations in fence
> >       signalling critical sections. Updates to the VA space are split up in three
> >       separate stages, where only the 2. stage executes in a fence signalling
> >       critical section:
> > 
> >         1. update the VA space, allocate new structures and page tables  
> 
> Sorry for the silly question, but I didn't find where the page tables
> pre-allocation happens. Mind pointing it to me? It's also unclear when
> this step happens. Is this at bind-job submission time, when the job is
> not necessarily ready to run, potentially waiting for other deps to be
> signaled. Or is it done when all deps are met, as an extra step before
> jumping to step 2. If that's the former, then I don't see how the VA
> space update can happen, since the bind-job might depend on other
> bind-jobs modifying the same portion of the VA space (unbind ops might
> lead to intermediate page table levels disappearing while we were
> waiting for deps). If it's the latter, I wonder why this is not
> considered as an allocation in the fence signaling path (for the
> bind-job out-fence to be signaled, you need these allocations to
> succeed, unless failing to allocate page-tables is considered like a HW
> misbehavior and the fence is signaled with an error in that case).

Ok, so I just noticed you only have one bind queue per drm_file
(cli->sched_entity), and jobs are executed in-order on a given queue,
so I guess that allows you to modify the VA space at submit time
without risking any modifications to the VA space coming from other
bind-queues targeting the same VM. And, if I'm correct, synchronous
bind/unbind ops take the same path, so no risk for those to modify the
VA space either (just wonder if it's a good thing to have to sync
bind/unbind operations waiting on async ones, but that's a different
topic).

> 
> Note that I'm not familiar at all with Nouveau or TTM, and it might
> be something that's solved by another component, or I'm just
> misunderstanding how the whole thing is supposed to work. This being
> said, I'd really like to implement a VM_BIND-like uAPI in pancsf using
> the gpuva_manager infra you're proposing here, so please bare with me
> :-).
> 
> >         2. (un-)map the requested memory bindings
> >         3. free structures and page tables
> > 
> >     - Separated generic job scheduler code from specific job implementations.
> >     - Separated the EXEC and VM_BIND implementation of the UAPI.
> >     - Reworked the locking parts of the nvkm/vmm RAW interface, such that
> >       (un-)map operations can be executed in fence signalling critical sections.
> >   
> 
> Regards,
> 
> Boris
>