[Mesa-dev] GBM and the Device Memory Allocator Proposals

Fri Dec 1 15:06:53 UTC 2017

On Thu, Nov 30, 2017 at 5:43 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> Hi,
>
> I've had a chance to look a bit more closely at the allocator prototype
> repository now. There's a whole bunch of low-level API design feedback, but
> for now let's focus on the high-level stuff first.
>
> Going by the 4.5 major object types (as also seen on slide 5 of your
> presentation [0]), assertions and usages make sense to me.
>
> Capabilities and capability sets should be cleaned up in my opinion, as the
> status quo is overly obfuscating things. What capability sets really
> represent, as far as I understand them, is *memory layouts*, and so that's
> what they should be called.
>
> This conceptually simplifies `derive_capabilities` significantly without any
> loss of expressiveness as far as I can see. Given two lists of memory
> layouts, we simply look for which memory layouts appear in both lists, and
> then merge their constraints and capabilities.
>
> Merging constraints looks good to me.
>
> Capabilities need some more thought. The prototype removes capabilities when
> merging layouts, but I'd argue that that is often undesirable. (In fact, I
> cannot think of capabilities which we'd always want to remove.)
>
> A typical example for this is compression (i.e. DCC in our case). For
> rendering usage, we'd return something like:
>
> Memory layout: AMD/tiled; constraints(alignment=64k); caps(AMD/DCC)
>
> For display usage, we might return (depending on hardware):
>
> Memory layout: AMD/tiled; constraints(alignment=64k); caps(none)
>
> Merging these in the prototype would remove the DCC capability, even though
> it might well make sense to keep it there for rendering. Dealing with the
> fact that display usage does not have this capability is precisely one of
> the two things that transitions are about! The other thing that transitions
> are about is caches.
>
> I think this is kind of what Rob was saying in one of his mails.

Perhaps "layout" is a better name than "caps".. either way I think of
both AMD/tiled and AMD/DCC as the same type of "thing".. the
difference between AMD/tiled and AMD/DCC is that a transition can be
provided for AMD/DCC.  Other than that they are both things describing
the layout.

So lets say you have a setup where both display and GPU supported
FOO/tiled, but only GPU supported compressed (FOO/CC) and cached
(FOO/cached).  But the GPU supported the following transitions:

  trans_a: FOO/CC -> null
  trans_b: FOO/cached -> null

Then the sets for each device (in order of preference):

GPU:
  1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=32k)
  2: caps(FOO/tiled, FOO/CC); constraints(alignment=32k)
  3: caps(FOO/tiled); constraints(alignment=32k)

Display:
  1: caps(FOO/tiled); constraints(alignment=64k)

Merged Result:
  1: caps(FOO/tiled, FOO/CC, FOO/cached); constraints(alignment=64k);
     transition(GPU->display: trans_a, trans_b; display->GPU: none)
  2: caps(FOO/tiled, FOO/CC); constraints(alignment=64k);
     transition(GPU->display: trans_a; display->GPU: none)
  3: caps(FOO/tiled); constraints(alignment=64k);
     transition(GPU->display: none; display->GPU: none)

> Two interesting questions:
>
> 1. If we query for multiple usages on the same device, can we get a
> capability which can only be used for a subset of those usages?

I think the original idea was, "no"..  perhaps that could restriction
could be lifted if transitions where part of the result.  Or maybe you
just query independently the same device for multiple different
usages, and then merge that cap-set.

(Do we need to care about intra-device transitions?  Or can we just
let the driver care about that, same as it always has?)

> 2. What happens when we merge memory layouts with sets of capabilities where
> neither is a subset of the other?

I think this is a case where no zero-copy sharing is possible, right?

> As for the actual transition API, I accept that some metadata may be
> required, and the metadata probably needs to depend on the memory layout,
> which is often vendor-specific. But even linear layouts need some
> transitions for caches. We probably need at least some generic "off-device
> usage" bit.

I've started thinking of cached as a capability with a transition.. I
think that helps.  Maybe it needs to somehow be more specific (ie. if
you have two devices both with there own cache with no coherency
between the two)

BR,
-R

>
> Cheers,
> Nicolai
>
> [0] https://www.x.org/wiki/Events/XDC2017/jones_allocator.pdf
>
>
> On 21.11.2017 02:11, James Jones wrote:
>>
>> As many here know at this point, I've been working on solving issues
>> related to DMA-capable memory allocation for various devices for some time
>> now.  I'd like to take this opportunity to apologize for the way I handled
>> the EGL stream proposals.  I understand now that the development process
>> followed there was unacceptable to the community and likely offended many
>> great engineers.
>>
>> Moving forward, I attempted to reboot talks in a more constructive manner
>> with the generic allocator library proposals & discussion forum at XDC 2016.
>> Some great design ideas came out of that, and I've since been prototyping
>> some code to prove them out before bringing them back as official proposals.
>> Again, I understand some people are growing concerned that I've been doing
>> this off on the side in a github project that has primarily NVIDIA
>> contributors.  My goal was only to avoid wasting everyone's time with
>> unproven ideas.  The intent was never to dump the prototype code as-is on
>> the community and presume acceptance. It's just a public research project.
>>
>> Now the prototyping is nearing completion, and I'd like to renew
>> discussion on whether and how the new mechanisms can be integrated with the
>> Linux graphics stack.
>>
>> I'd be interested to know if more work is needed to demonstrate the
>> usefulness of the new mechanisms, or whether people think they have value at
>> this point.
>>
>> After talking with people on the hallway track at XDC this year, I've
>> heard several proposals for incorporating the new mechanisms:
>>
>> -Include ideas from the generic allocator design into GBM.  This could
>> take the form of designing a "GBM 2.0" API, or incrementally adding to the
>> existing GBM API.
>>
>> -Develop a library to replace GBM.  The allocator prototype code could be
>> massaged into something production worthy to jump start this process.
>>
>> -Develop a library that sits beside or on top of GBM, using GBM for
>> low-level graphics buffer allocation, while supporting non-graphics kernel
>> APIs directly.  The additional cross-device negotiation and sorting of
>> capabilities would be handled in this slightly higher-level API before
>> handing off to GBM and other APIs for actual allocation somehow.
>>
>> -I have also heard some general comments that regardless of the
>> relationship between GBM and the new allocator mechanisms, it might be time
>> to move GBM out of Mesa so it can be developed as a stand-alone project.
>> I'd be interested what others think about that, as it would be something
>> worth coordinating with any other new development based on or inside of GBM.
>>
>> And of course I'm open to any other ideas for integration.  Beyond just
>> where this code would live, there is much to debate about the mechanisms
>> themselves and all the implementation details.  I was just hoping to kick
>> things off with something high level to start.
>>
>> For reference, the code Miguel and I have been developing for the
>> prototype is here:
>>
>>     https://github.com/cubanismo/allocator
>>
>> And we've posted a port of kmscube that uses the new interfaces as a
>> demonstration here:
>>
>>     https://github.com/cubanismo/kmscube
>>
>> There are still some proposed mechanisms (usage transitions mainly) that
>> aren't prototyped, but I think it makes sense to start discussing
>> integration while prototyping continues.
>>
>> In addition, I'd like to note that NVIDIA is committed to providing open
>> source driver implementations of these mechanisms for our hardware, in
>> addition to support in our proprietary drivers.  In other words, wherever
>> modifications to the nouveau kernel & userspace drivers are needed to
>> implement the improved allocator mechanisms, we'll be contributing patches
>> if no one beats us to it.
>>
>> Thanks in advance for any feedback!
>>
>> -James Jones
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
>
>
> --
> Lerne, wie die Welt wirklich ist,
> Aber vergiss niemals, wie sie sein sollte.
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev