[Mesa-dev] GBM and the Device Memory Allocator Proposals

Thu Nov 30 05:59:30 UTC 2017

On 11/29/2017 04:09 PM, Miguel Angel Vico wrote:
> 
> 
> On Wed, 29 Nov 2017 16:28:15 -0500
> Rob Clark <robdclark at gmail.com> wrote:
> 
>> On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico <mvicomoya at nvidia.com> wrote:
>>> Many of you may already know, but James is going to be out for a few
>>> weeks and I'll be taking over this in the meantime.

Sorry for the unfortunate timing.  I am indeed on paternity leave at the 
moment.  Some quick comments below.  I'll be trying to follow the 
discussion as time allows while I'm out.

>>> See inline for comments.
>>>
>>> On Wed, 29 Nov 2017 09:33:29 -0800
>>> Jason Ekstrand <jason at jlekstrand.net> wrote:
>>>   
>>>> On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark <robdclark at gmail.com> wrote:
>>>>   
>>>>> On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand <jason at jlekstrand.net>
>>>>> wrote:
>>>>>> On November 24, 2017 09:29:43 Rob Clark <robdclark at gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 20, 2017 at 8:11 PM, James Jones <jajones at nvidia.com>
>>>>> wrote:
>>>>>>>>
>>>>>>>> As many here know at this point, I've been working on solving issues
>>>>>>>> related
>>>>>>>> to DMA-capable memory allocation for various devices for some time now.
>>>>>>>> I'd
>>>>>>>> like to take this opportunity to apologize for the way I handled the
>>>>> EGL
>>>>>>>> stream proposals.  I understand now that the development process
>>>>> followed
>>>>>>>> there was unacceptable to the community and likely offended many great
>>>>>>>> engineers.
>>>>>>>>
>>>>>>>> Moving forward, I attempted to reboot talks in a more constructive
>>>>> manner
>>>>>>>> with the generic allocator library proposals & discussion forum at XDC
>>>>>>>> 2016.
>>>>>>>> Some great design ideas came out of that, and I've since been
>>>>> prototyping
>>>>>>>> some code to prove them out before bringing them back as official
>>>>>>>> proposals.
>>>>>>>> Again, I understand some people are growing concerned that I've been
>>>>>>>> doing
>>>>>>>> this off on the side in a github project that has primarily NVIDIA
>>>>>>>> contributors.  My goal was only to avoid wasting everyone's time with
>>>>>>>> unproven ideas.  The intent was never to dump the prototype code as-is
>>>>> on
>>>>>>>> the community and presume acceptance. It's just a public research
>>>>>>>> project.
>>>>>>>>
>>>>>>>> Now the prototyping is nearing completion, and I'd like to renew
>>>>>>>> discussion
>>>>>>>> on whether and how the new mechanisms can be integrated with the Linux
>>>>>>>> graphics stack.
>>>>>>>>
>>>>>>>> I'd be interested to know if more work is needed to demonstrate the
>>>>>>>> usefulness of the new mechanisms, or whether people think they have
>>>>> value
>>>>>>>> at
>>>>>>>> this point.
>>>>>>>>
>>>>>>>> After talking with people on the hallway track at XDC this year, I've
>>>>>>>> heard
>>>>>>>> several proposals for incorporating the new mechanisms:
>>>>>>>>
>>>>>>>> -Include ideas from the generic allocator design into GBM.  This could
>>>>>>>> take
>>>>>>>> the form of designing a "GBM 2.0" API, or incrementally adding to the
>>>>>>>> existing GBM API.
>>>>>>>>
>>>>>>>> -Develop a library to replace GBM.  The allocator prototype code could
>>>>> be
>>>>>>>> massaged into something production worthy to jump start this process.
>>>>>>>>
>>>>>>>> -Develop a library that sits beside or on top of GBM, using GBM for
>>>>>>>> low-level graphics buffer allocation, while supporting non-graphics
>>>>>>>> kernel
>>>>>>>> APIs directly.  The additional cross-device negotiation and sorting of
>>>>>>>> capabilities would be handled in this slightly higher-level API before
>>>>>>>> handing off to GBM and other APIs for actual allocation somehow.
>>>>>>>
>>>>>>>
>>>>>>> tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
>>>>>>> still the "winsys" for running on "bare metal" (ie. kms).  And we
>>>>>>> don't want to saddle $new_thing with aspects of that, but rather have
>>>>>>> it focus on being the thing that in multiple-"device"[1] scenarious
>>>>>>> figures out what sort of buffer can be allocated by who for sharing.
>>>>>>> Ie $new_thing should really not care about winsys level things like
>>>>>>> cursors or surfaces.. only buffers.
>>>>>>>
>>>>>>> The mesa implementation of $new_thing could sit on top of GBM,
>>>>>>> although it could also just sit on top of the same internal APIs that
>>>>>>> GBM sits on top of.  That is an implementation detail.  It could be
>>>>>>> that GBM grows an API to return an instance of $new_thing for
>>>>>>> use-cases that involve sharing a buffer with the GPU.  Or perhaps that
>>>>>>> is exposed via some sort of EGL extension.  (We probably also need a
>>>>>>> way to get an instance from libdrm (?) for display-only KMS drivers,
>>>>>>> to cover cases like etnaviv sharing a buffer with a separate display
>>>>>>> driver.)
>>>>>>>
>>>>>>> [1] where "devices" could be multiple GPUs or multiple APIs for one or
>>>>>>> more GPUs, but also includes non-GPU devices like camera, video
>>>>>>> decoder, "image processor" (which may or may not be part of camera),
>>>>>>> etc, etc
>>>>>>
>>>>>>
>>>>>> I'm not quite some sure what I think about this.  I think I would like to
>>>>>> see $new_thing at least replace the guts of GBM. Whether GBM becomes a
>>>>>> wrapper around $new_thing or $new_thing implements the GBM API, I'm not
>>>>>> sure.  What I don't think I want is to see GBM development continuing on
>>>>>> it's own so we have two competing solutions.
>>>>>
>>>>> I don't really view them as competing.. there is *some* overlap, ie.
>>>>> allocating a buffer.. but even if you are using GBM w/out $new_thing
>>>>> you could allocate a buffer externally and import it.  I don't see
>>>>> $new_thing as that much different from GBM PoV.
>>>>>
>>>>> But things like surfaces (aka swap chains) seem a bit out of place
>>>>> when you are thinking about implementing $new_thing for non-gpu
>>>>> devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
>>>>> about a (for ex.) camera.  I kinda don't want to throw out the baby
>>>>> with the bathwater here.
>>>>>   
>>>>
>>>> Agreed.  GBM is very EGLish and we don't want the new allocator to be that.
>>>>
>>>>   
>>>>> *maybe* GBM could be partially implemented on top of $new_thing.  I
>>>>> don't quite see how that would work.  Possibly we could deprecate
>>>>> parts of GBM that are no longer needed?  idk..  Either way, I fully
>>>>> expect that GBM and mesa's implementation of $new_thing could perhaps
>>>>> sit on to of some of the same set of internal APIs.  The public
>>>>> interface can be decoupled from the internal implementation.
>>>>>   
>>>>
>>>> Maybe I should restate things a bit.  My real point was that modifiers +
>>>> $new_thing + Kernel blob should be a complete and more powerful replacement
>>>> for GBM.  I don't know that we really can implement GBM on top of it
>>>> because GBM has lots of wishy-washy concepts such as "cursor plane" which
>>>> may not map well at least not without querying the kernel about specifc
>>>> display planes.  In particular, I don't want someone to feel like they need
>>>> to use $new_thing and GBM at the same time or together.  Ideally, I'd like
>>>> them to never do that unless we decide gbm_bo is a useful abstraction for
>>>> $new_thing.
>>>>   
>>>
>>> I'm not really familiar with GBM guts, so I don't know how easy would
>>> it be to make GBM rely on the allocator for the buffer allocations.
>>> Maybe that's something worth exploring. What I wouldn't like is
>>> $new_thing to fall short because we are trying to shove it under GBM's
>>> hood.
>>>   
>>
>> yeah, I think we should consider functionality of $new_thing
>> independent of GBM.. how to go from individual buffers allocated via
>> $new_thing to EGL surface/swapchain is I think out of scope for
>> $new_thing.
>>
>>> It seems to me that $new_thing should grow as a separate thing whether
>>> it ends up replacing GBM or GBM internals are somewhat rewritten on top
>>> of it. If I'm reading you both correctly, you agree with that, so in
>>> order to move forward, should we go ahead and create a project in fd.o?
>>>
>>> Before filing the new project request though, we should find an
>>> appropriate name for $new_thing. Creativity isn't one of my strengths,
>>> but I'll go ahead and start the bikeshedding with "Generic Device
>>> Memory Allocator" or "Generic Device Memory Manager".
>>
>> liballoc - Generic Device Memory Allocator ... seems reasonable to me..
> 
> Cool. If there aren't better suggestions, we can go with that. We
> should also namespace all APIs and structures. Is 'galloc' distinctive
> enough to be used as namespace? Being an 'r' away from gralloc maybe
> it's a bit confusing?
> 
>>
>> I think it is reasonable to live on github until we figure out how
>> transitions work.. or in particular are there any thread restrictions
>> or interactions w/ gl context if transitions are done on the gpu or
>> anything like that?  Or can we just make it more vulkan like w/
>> explicit ctx ptr, and pass around fence fd's to synchronize everyone??
>>   I haven't thought about the transition part too much but I guess we
>> should have a reasonable idea for how that should work before we start
>> getting too many non-toy users, lest we find big API changes are
>> needed..
> 
> Seems fine, but I would like to get other people other than NVIDIANs
> involved giving feedback on the design as we move forward with the
> prototype.
> 
> Due to lack of a better list, is it okay to start sending patches to
> mesa-dev? If that's a too broad audience, should I just CC specific
> individuals that have somewhat contributed to the project?
> 
>>
>> Do we need to define both in-place and copy transitions?  Ie. what if
>> GPU is still reading a tiled or compressed texture (ie. sampling from
>> previous frame for some reason), but we need to untile/uncompress for
>> display.. of maybe there are some other cases like that we should
>> think about..
>>
>> Maybe you already have some thoughts about that?
> 
> This is the next thing I'll be working on. I haven't given it much
> thought myself so far, but I think James might have had some insights.
> I'll read through some of his notes to double-check.

A couple of notes on usage transitions:

While chatting about transitions, a few assertions were made by others 
that I've come to accept, despite the fact that they reduce the 
generality of the allocator mechanisms:

-GPUs are the only things that actually need usage transitions as far as 
I know thus far.  Other engines either share the GPU representations of 
data, or use more limited representations; the latter being the reason 
non-GPU usage transitions are a useful thing.

-It's reasonable to assume that a GPU is required to perform a usage 
transition.  This follows from the above postulate.  If only GPUs are 
using more advanced representations, you don't need any transitions 
unless you have a GPU available.

 From that, I derived the rough API proposal for transitions presented 
on my XDC 2017 slides.  Transition "metadata" is queried from the 
allocator given a pair of usages (which may refer to more than one 
device), but the realization of the transition is left to existing GPU 
APIs.  I think I put Vulkan-like pseudo-code in the slides, but the GL 
external objects extensions (GL_EXT_memory_object and GL_EXT_semaphore) 
would work as well.

Regarding in-place Vs. copy: To me a transition is something that 
happens in-place, at least semantically.  If you need to make copies, 
that's a format conversion blit not a transition, and graphics APIs are 
already capable of expressing that without any special transitions or 
help from the allocator.  However, I understand some chipsets perform 
transitions using something that looks kind of like a blit using on-chip 
caches and constrained usage semantics.  There's probably some work to 
do to see whether those need to be accommodated as conversion blits or 
usgae transitions.

For our hardware's purposes, transitions are just various levels of 
decompression or compression reconfiguration and potentially cache 
flushing/invalidation, so our transition metadata will just be some bits 
signaling which compression operation is needed, if any.  That's the 
sort of operation I modeled the API around, so if things are much more 
exotic than that for others, it will probably require some adjustments.

> Thanks,
> Miguel.
> 
>>
>>> Once we agree upon something, I can take care of filing the request,
>>> but I'm unclear what the initial list of approvers should be.
>>> Looking at the main contributors of both the initial draft of
>>> $new_thing and git repository, does the following list of people seem
>>> reasonable?
>>>
>>>   * Rob Clark
>>>   * Jason Ekstrand
>>>   * James Jones
>>>   * Chad Versace
>>>   * Miguel A Vico
>>>
>>> I never started a project in fd.o, so any useful advice will be
>>> appreciated.
>>
>> fwiw, https://www.freedesktop.org/wiki/NewProject/
>>
>> BR,
>> -R
>>
>>> Thanks,
>>> Miguel.
>>>   
>>>>   
>>>>>> I *think* I like the idea of having $new_thing implement GBM as a
>>>>> deprecated
>>>>>> legacy API.  Whether that means we start by pulling GBM out into it's own
>>>>>> project or we start over, I don't know.  My feeling is that the current
>>>>>> dri_interface is *not* what we want which is why starting with GBM makes
>>>>> me
>>>>>> nervous.
>>>>>
>>>>> /me expects if we pull GBM out of mesa, the interface between GBM and
>>>>> mesa (or other GL drivers) is 'struct gbm_device'.. so "GBM the
>>>>> project" is just a thin shim plus some 'struct gbm_device' versioning.
>>>>>
>>>>> BR,
>>>>> -R
>>>>>   
>>>>>> I need to go read through your code before I can provide a stronger or
>>>>> more
>>>>>> nuanced opinion.  That's not going to happen before the end of the year.

I hope you and others, especially those of you who seem to already have 
some well-formed ideas about end-goals for this project, do get a chance 
to go through the prototype code and simple kmscube example at some 
point.  A code review is worth a thousand high-level design discussions 
IMHO, and it really isn't that much code at this point.  Of course, I 
understand everyone's busy this time of year.

>>>>>>>> -I have also heard some general comments that regardless of the
>>>>>>>> relationship
>>>>>>>> between GBM and the new allocator mechanisms, it might be time to move
>>>>>>>> GBM
>>>>>>>> out of Mesa so it can be developed as a stand-alone project.  I'd be
>>>>>>>> interested what others think about that, as it would be something worth
>>>>>>>> coordinating with any other new development based on or inside of GBM.
>>>>>>>
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> We already have at least a couple different non-mesa implementations
>>>>>>> of GBM (which afaict tend to lag behind mesa's GBM and cause
>>>>>>> headaches).
>>>>>>>
>>>>>>> The extracted part probably isn't much more than a header and shim.
>>>>>>> But probably does need to grow some versioning for the backend to know
>>>>>>> if, for example, gbm->bo_map() is supported.. at least it could
>>>>>>> provide stubs that return an error, rather than having link-time fail
>>>>>>> if building something w/ $vendor's old gbm implementation.
>>>>>>>   
>>>>>>>> And of course I'm open to any other ideas for integration.  Beyond just
>>>>>>>> where this code would live, there is much to debate about the
>>>>> mechanisms
>>>>>>>> themselves and all the implementation details.  I was just hoping to
>>>>> kick
>>>>>>>> things off with something high level to start.
>>>>>>>
>>>>>>>
>>>>>>> My $0.02, is that the place where devel happens and place to go for
>>>>>>> releases could be different.  Either way, I would like to see git tree
>>>>>>> for tagged release versions live on fd.o and use the common release
>>>>>>> process[2] for generating/uploading release tarballs that distros can
>>>>>>> use.
>>>>>>
>>>>>>
>>>>>> Agreed.  I think fd.o is the right place for such a project to live.  We
>>>>> can
>>>>>> have mirrors on GitHub and other places but fd.o is where Linux graphics
>>>>>> stack development currently happens.
>>>>>>   
>>>>>>> [2] https://cgit.freedesktop.org/xorg/util/modular/tree/release.sh
>>>>>>>   
>>>>>>>> For reference, the code Miguel and I have been developing for the
>>>>>>>> prototype
>>>>>>>> is here:
>>>>>>>>
>>>>>>>>     https://github.com/cubanismo/allocator
>>>>>>>>
>>>>>>>> And we've posted a port of kmscube that uses the new interfaces as a
>>>>>>>> demonstration here:
>>>>>>>>
>>>>>>>>     https://github.com/cubanismo/kmscube
>>>>>>>>
>>>>>>>> There are still some proposed mechanisms (usage transitions mainly)
>>>>> that
>>>>>>>> aren't prototyped, but I think it makes sense to start discussing
>>>>>>>> integration while prototyping continues.
>>>>>>>
>>>>>>>
>>>>>>> btw, I think a nice end goal would be a gralloc implementation using
>>>>>>> this new API for sharing buffers in various use-cases.  That could
>>>>>>> mean converting gbm-gralloc, or perhaps it means something new.
>>>>>>>
>>>>>>> AOSP has support for mesa + upstream kernel for some devices which
>>>>>>> also have upstream camera and/or video decoder in addition to just
>>>>>>> GPU.. and this is where you start hitting the limits of a GBM based
>>>>>>> gralloc.  In a lot of way, I view $new_thing as what gralloc *should*
>>>>>>> have been, but at least it provides a way to implement a generic
>>>>>>> gralloc.
>>>>>>
>>>>>>
>>>>>> +100
>>>>>>
>>>>>>   
>>>>>>> Maybe that is getting a step ahead, there is a lot we can prototype
>>>>>>> with kmscube.  But gralloc gets us into interesting real-world
>>>>>>> use-cases that involve more than just GPUs.  Possibly this would be
>>>>>>> something that linaro might be interested in getting involved with?

Gralloc-on-$new_thing, as well as hwcomposer-on-$new_thing is one of my 
primary goals.  However, it's a pretty heavy thing to prototype.  If 
someone has the time though, I think it would be a great experiment.  It 
would help flesh out the paltry list of usages, constraints, and 
capabilities in the existing prototype codebase.  The kmscube example 
really should have added at least a "render" usage, but I got lazy and 
just re-used texture for now.  That won't actually work on our HW in all 
cases, but it's good enough for kmscube.

Thanks,
-James

>>>>>>> BR,
>>>>>>> -R
>>>>>>>   
>>>>>>>> In addition, I'd like to note that NVIDIA is committed to providing
>>>>> open
>>>>>>>> source driver implementations of these mechanisms for our hardware, in
>>>>>>>> addition to support in our proprietary drivers.  In other words,
>>>>> wherever
>>>>>>>> modifications to the nouveau kernel & userspace drivers are needed to
>>>>>>>> implement the improved allocator mechanisms, we'll be contributing
>>>>>>>> patches
>>>>>>>> if no one beats us to it.
>>>>>>>>
>>>>>>>> Thanks in advance for any feedback!
>>>>>>>>
>>>>>>>> -James Jones
>>>>>>>> _______________________________________________
>>>>>>>> mesa-dev mailing list
>>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> mesa-dev mailing list
>>>>>>> mesa-dev at lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>>>>>
>>>>>>
>>>>>>   
>>>>>   
>>>
>>>
>>> --
>>> Miguel
>>>
>>>   
> 
>