[Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

Thu Dec 21 08:05:32 UTC 2017

On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen
<hoegsberg at google.com> wrote:
> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <mvicomoya at nvidia.com>
> wrote:
>>
>> Inline.
>>
>> On Wed, 20 Dec 2017 11:54:10 -0800
>> Kristian Høgsberg <hoegsberg at gmail.com> wrote:
>>
>> > On Wed, Dec 20, 2017 at 11:51 AM, Daniel Vetter <daniel at ffwll.ch> wrote:
>> > > Since this also involves the kernel let's add dri-devel ...
>>
>> Yeah, I forgot. Thanks Daniel!
>>
>> > >
>> > > On Wed, Dec 20, 2017 at 5:51 PM, Miguel Angel Vico
>> > > <mvicomoya at nvidia.com> wrote:
>> > >> Hi all,
>> > >>
>> > >> As many of you already know, I've been working with James Jones on
>> > >> the
>> > >> Generic Device Allocator project lately. He started a discussion
>> > >> thread
>> > >> some weeks ago seeking feedback on the current prototype of the
>> > >> library
>> > >> and advice on how to move all this forward, from a prototype stage to
>> > >> production. For further reference, see:
>> > >>
>> > >>
>> > >> https://lists.freedesktop.org/archives/mesa-dev/2017-November/177632.html
>> > >>
>> > >> From the thread above, we came up with very interesting high level
>> > >> design ideas for one of the currently missing parts in the library:
>> > >> Usage transitions. That's something I'll personally work on during
>> > >> the
>> > >> following weeks.
>> > >>
>> > >>
>> > >> In the meantime, I've been working on putting together an open source
>> > >> implementation of the allocator mechanisms using the Nouveau driver
>> > >> for
>> > >> all to be able to play with.
>> > >>
>> > >> Below I'm seeking feedback on a bunch of changes I had to make to
>> > >> different components of the graphics stack:
>> > >>
>> > >> ** Allocator **
>> > >>
>> > >>   An allocator driver implementation on top of Nouveau. The current
>> > >>   implementation only handles pitch linear layouts, but that's enough
>> > >>   to have the kmscube port working using the allocator and Nouveau
>> > >>   drivers.
>> > >>
>> > >>   You can pull these changes from
>> > >>
>> > >>
>> > >> https://github.com/mvicomoya/allocator/tree/wip/mvicomoya/nouveau-driver
>> > >>
>> > >> ** Mesa **
>> > >>
>> > >>   James's kmscube port to use the allocator relies on the
>> > >>   EXT_external_objects extension to import allocator allocations to
>> > >>   OpenGL as a texture object. However, the Nouveau implementation of
>> > >>   these mechanisms is missing in Mesa, so I went ahead and added
>> > >> them.
>> > >>
>> > >>   You can pull these changes from
>> > >>
>> > >>
>> > >> https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/EXT_external_objects-nouveau
>> > >>
>> > >>   Also, James's kmscube port uses the NVX_unix_allocator_import
>> > >>   extension to attach allocator metadata to texture objects so the
>> > >>   driver knows how to deal with the imported memory.
>> > >>
>> > >>   Note that there isn't a formal spec for this extension yet. For
>> > >> now,
>> > >>   it just serves as an experimental mechanism to import allocator
>> > >>   memory in OpenGL, and attach metadata to texture objects.
>> > >>
>> > >>   You can pull these changes (written on top of the above) from:
>> > >>
>> > >>
>> > >> https://github.com/mvicomoya/mesa/tree/wip/mvicomoya/NVX_unix_allocator_import
>> > >>
>> > >> ** kmscube **
>> > >>
>> > >>   Mostly minor fixes and improvements on top of James's port to use
>> > >> the
>> > >>   allocator. Main thing is the allocator initialization path will use
>> > >>   EGL_MESA_platform_surfaceless if EGLDevice platform isn't supported
>> > >>   by the underlying EGL implementation.
>> > >>
>> > >>   You can pull these changes from:
>> > >>
>> > >>
>> > >> https://github.com/mvicomoya/kmscube/tree/wip/mvicomoya/allocator-nouveau
>> > >>
>> > >>
>> > >> With all the above you should be able to get kmscube working using
>> > >> the
>> > >> allocator on top of the Nouveau driver.
>> > >>
>> > >>
>> > >> Another of the missing pieces before we can move this to production
>> > >> is
>> > >> importing allocations to DRM FB objects. This is probably one of the
>> > >> most sensitive parts of the project as it requires
>> > >> modification/addition
>> > >> of kernel driver interfaces.
>> > >>
>> > >> At XDC2017, James had several hallway conversations with several
>> > >> people
>> > >> about this, all having different opinions. I'd like to take this
>> > >> opportunity to also start a discussion about what's the best option
>> > >> to
>> > >> create a path to get allocator allocations added as DRM FB objects.
>> > >>
>> > >> These are the few options we've considered to start with:
>> > >>
>> > >>   A) Have vendor-private ioctls to set properties on GEM objects that
>> > >>      are inherited by the FB objects. This is how our (NVIDIA)
>> > >> desktop
>> > >>      DRM driver currently works. This would require every vendor to
>> > >> add
>> > >>      their own ioctl to process allocator metadata, but the metadata
>> > >> is
>> > >>      actually a vendor-agnostic object more like DRM modifiers. We'd
>> > >>      like to come up with a vendor-agnostic solutions that can be
>> > >>      integrated to core DRM.
>> > >>
>> > >>   B) Add a new drmModeAddFBWithMetadata() command that takes
>> > >> allocator
>> > >>      metadata blobs for each plane of the FB. Some people in the
>> > >>      community have mentioned this is their preferred design. This,
>> > >>      however, means we'd have to go through the exercise of adding
>> > >>      another metadata mechanism to the whole graphics stack.
>> > >>
>> > >>   C) Shove allocator metadata into DRM by defining it to be a
>> > >> separate
>> > >>      plane in the image, and using the existing DRM modifiers
>> > >> mechanism
>> > >>      to indicate there is another plane for each "real" plane added.
>> > >> It
>> > >>      isn't clear how this scales to surfaces that already need
>> > >> several
>> > >>      planes, but there are some people that see this as the only way
>> > >>      forward. Also, we would have to create a separate GEM buffer for
>> > >>      the metadatada itself, which seems excessive.
>> > >>
>> > >> We personally like option (B) better, and have already started to
>> > >> prototype the new path (which is actually very similar to the
>> > >> drmModeAddFB2() one). You can take a look at the new interfaces here:
>> > >>
>> > >>
>> > >> https://github.com/mvicomoya/linux/tree/wip/mvicomoya/drm_addfb_with_metadata__4.14-rc8
>> > >>
>> > >> There may be other options that haven't been explored yet that could
>> > >> be
>> > >> a better choice than the above, so any suggestion will be greatly
>> > >> appreciated.
>> > >
>> > > What kind of metadata are we talking about here? Addfb has tons of
>> > > stuff already that's "metadata". The only thing I've spotted is
>> > > PITCH_ALIGNMENT, which is maybe something we want drm drivers to tell
>> > > userspace, but definitely not something addfb ever needs. addfb only
>> > > needs the resulting pitch that we actually allocated (and might decide
>> > > it doesn't like that, but that's a different issue).
>>
>> Sorry I failed to make it clearer. Metadata here refers to all
>> allocation parameters the generic allocator was given to allocate
>> memory. That currently means the final capability set used for
>> the allocation, including all constraints (such as memory alignment,
>> pitch alignment, and others) and capabilities, describing allocation
>> properties like tiling formats, compression, and such.

Yeah, that part was all clear. I'd want more details of what exact
kind of metadata. fast-clear colors? tiling layouts? aux data for the
compressor? hiz (or whatever you folks call it) tree?

As you say, we've discussed massive amounts of different variants on
this, and there's different answers for different questions. Consensus
seems to be that bigger stuff (compression data, hiz, clear colors,
...) should be stored in aux planes, while the exact layout and what
kind of aux planes you have are encoded in the modifier.

>> > >
>> > > And since there's no patches for nouveau itself I can't really say
>> > > anything beyond that.
>>
>> I can work on implementing these interfaces for nouveau, maybe
>> partially, if that's going to help. I just thought it'd be better to
>> first start a discussion on what would be the right way to pass
>> allocator metadata to display drivers before starting to seriously
>> implement any of the proposed options.

It's not so much wiring down the interfaces, but actually implementing
the features. "We need more than the 56bits of modifier" is a lot more
plausible when you have the full stack showing that you do actually
need it. Or well, not a full stack but at least a demo that shows what
you want to pull of but can't do right now.

>> > I'd like to see concrete examples of actual display controllers
>> > supporting more format layouts than what can be specified with a 64
>> > bit modifier.
>>
>> The main problem is our tiling and other metadata parameters can't
>> generally fit in a modifier, so we find passing a blob of metadata a
>> more suitable mechanism.
>
>
> I understand that you may have n knobs with a total of more than a total of
> 56 bits that configure your tiling/swizzling for color buffers. What I don't
> buy is that you need all those combinations when passing buffers around
> between codecs, cameras and display controllers. Even if you're sharing
> between the same 3D drivers in different processes, I expect just locking
> down, say, 64 different combinations (you can add more over time) and
> assigning each a modifier would be sufficient. I doubt you'd extract
> meaningful performance gains from going all the way to a blob.

Tegra just redesigned it's modifier space from an ungodly amount of
bits to just a few layouts. Not even just the ones in used, but simply
limiting to the ones that make sense (there's dependencies apparently)
Also note that the modifier alone doesn't need to describe the layout
precisely, it only makes sense together with a specific pixel format
and size. E.g. a bunch of the i915 layouts change layout depending
upon bpp.

> If you want us the redesign KMS and the rest of the eco system around blobs
> instead of the modifiers that are now moderately pervasive, you have to
> justify it a little better than just "we didn't find it suitable".

Given that this involves the kernel and hence the kernel's userspace
requirements for merging stuff (assuming of course you want to
establish this as an upstream interface), then I'd say a sufficient
demonstration would be actually running out of bits in nouveau
(kernel+mesa).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch