[Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

Thu Dec 21 08:06:36 UTC 2017

On 12/20/2017 01:58 PM, Daniel Stone wrote:
> Hi Miguel,
> 
> On 20 December 2017 at 16:51, Miguel Angel Vico <mvicomoya at nvidia.com> wrote:
>> In the meantime, I've been working on putting together an open source
>> implementation of the allocator mechanisms using the Nouveau driver for
>> all to be able to play with.
> 
> Thanks for taking a look at this! I'm still winding out my to-do list
> for the year, but hoping to get to this more seriously soon.
> 
> As a general comment, now that modifiers are a first-class concept in
> many places (KMS FBs, KMS plane format advertisement, V4L2 buffers,
> EGL/Vulkan image import/export, Wayland buffer import, etc), I'd like
> to see them included as a first-class concept in the allocator. I
> understand one of the primary reservations against using them was that
> QNX didn't have such a concept, but just specifying them to be ignored
> on non-Linux platforms would probably work fine.

The allocator mechanisms and format modifiers are orthogonal though. 
Either capability sets can be represented using format modifiers (the 
direction one part of this thread is suggesting, which I think is a bad 
idea), or format modifiers could easily be included as a vendor-agnostic 
capability, similar to pitch layout.  There are no "First class 
citizens" in the allocator mechanism itself.  That's the whole idea: 
Apps don't need to care about things like how the OS represents its 
surface metadata beyond some truly universal things like width and 
height (assertions).  The rest is abstracted away such that the apps are 
portable, even if the drivers/backends aren't.  Even if the solution 
within Linux is "just use format modifiers", there's still some benefit 
to making the kernel ABI use something slightly higher level that 
translates to DRM format modifiers inside the kernel, just to keep the 
apps OS-agnostic.

>> Another of the missing pieces before we can move this to production is
>> importing allocations to DRM FB objects. This is probably one of the
>> most sensitive parts of the project as it requires modification/addition
>> of kernel driver interfaces.
>>
>> At XDC2017, James had several hallway conversations with several people
>> about this, all having different opinions. I'd like to take this
>> opportunity to also start a discussion about what's the best option to
>> create a path to get allocator allocations added as DRM FB objects.
>>
>> These are the few options we've considered to start with:
>>
>>    A) Have vendor-private ioctls to set properties on GEM objects that
>>       are inherited by the FB objects. This is how our (NVIDIA) desktop
>>       DRM driver currently works. This would require every vendor to add
>>       their own ioctl to process allocator metadata, but the metadata is
>>       actually a vendor-agnostic object more like DRM modifiers. We'd
>>       like to come up with a vendor-agnostic solutions that can be
>>       integrated to core DRM.
> 
> This worries me. If the data is static for the lifetime of the buffer
> - describing the tiling layout, for instance - then it would form
> effective ABI for all the consumers/producers using that buffer type.
> If it is dynamic, you also have a world of synchronisation problems
> when multiple users race each other with different uses of that buffer
> (and presumably you would need to reload the metadata on every use?).
> Either way, anyone using this would need to have a very well-developed
> compatibility story, given that you can mix and match kernel and
> userspace versions.

I think the metadata is static.  The surface meta-state is not, but that 
would be a commit time thing if anything, not a GEM or FB object thing. 
Still attaching metadata to GEM objects, which seem to be opaque blobs 
of memory in the general case, rather than attaching it to FB's mapped 
onto the GEM objects always felt architecturally wrong to me.  You can 
have multiple FBs in one GEM object, for example.  There's no reason to 
assume they would share the same format let alone tiling layout.

>>    B) Add a new drmModeAddFBWithMetadata() command that takes allocator
>>       metadata blobs for each plane of the FB. Some people in the
>>       community have mentioned this is their preferred design. This,
>>       however, means we'd have to go through the exercise of adding
>>       another metadata mechanism to the whole graphics stack.
> 
> Similarly, this seems to be missing either a 'mandatory' flag so
> userspace can inform the kernel it must fail if it does not understand
> certain capabilities, or a way for the kernel to inform userspace
> which capabilities it does/doesn't understand.

I think that will fall out of the discussion over exactly what 
capability sets look like.  Regardless, yes, the kernel must fail if it 
can't support a given capability set, just as it would fail if it 
couldn't support a given DRM Format modifier.  Like the format 
modifiers, the userspace allocator driver would have queried the DRM 
kernel driver when reporting supported capability sets for a usage that 
required creating FBs, so it would always be user error to reach such a 
state.  There would be no ambiguity as to whether a given set or 
individual capability was supported by the kernel driver at FB creation 
time.

> The capabilities in the example are also very oddly chosen. Address
> alignment, pitch alignment, and maximum pitch are superfluous: the KMS
> driver is the single source of truth for these values for FBs, so it
> isn't useful for userspace to provide it. Specifically for pitch
> alignment and maximum pitch, the pitch values are already given in the
> same ioctl, so all you can check with these values (before the driver
> does its own check again) is that userspace is self-consistent. These
> three capabilities all relate to BO allocation rather than FB
> creation: if a BO is rendered into with the wrong pitch, or allocated
> at the wrong base address, we've already lost because the allocation
> was incorrect.
> 
> Did you have some other capabilities in mind which would be more
> relevant to FBs?

We should probably create an example using some vendor-specific 
capabilities.  Tiling parameters are the quintessential vendor-specific 
example.

>>    C) Shove allocator metadata into DRM by defining it to be a separate
>>       plane in the image, and using the existing DRM modifiers mechanism
>>       to indicate there is another plane for each "real" plane added. It
>>       isn't clear how this scales to surfaces that already need several
>>       planes, but there are some people that see this as the only way
>>       forward. Also, we would have to create a separate GEM buffer for
>>       the metadatada itself, which seems excessive.
> 
> I also have my reservations about this one. The general idea behind
> FBs is that, if the buffers are identical but for memory addresses and
> pixel content, the parameters should be equal but the per-plane buffer
> contents different. Conversely, if the buffers differ in any way but
> the above, the parameters should be different. For instance, if
> buffers have identical layouts (tiling/swizzling/compression),
> identical pixel content once interpreted, but the only thing which
> differs is the compression status (fully resolved / not resolved), I
> would expect to see identical parameters and differing data in the
> auxiliary compression plane. We had quite a long discussion for
> framebuffer compression on Intel where we shot down the concept of
> expressing compression status as a plane property for basically this
> reason.

Actual compression data is read by the GPU though.  Specifying metadata 
(I.e., a few bits saying whether there is a compression plane and if so, 
what it's layout is) only accessed by kernel drivers and userspace 
drivers as an FB plane seems to be stretching the abstraction a bit. 
That means you probably have to create a CPU-only GEM buffer to put it 
in.  Not impossible, just a lot of busy work, and potentially bug-prone: 
  If previously your kernel driver only supported creating GEM buffers 
that were HW accessible, you have to sprinkle a bunch of "But not this 
type of GEM buffer" in all your validation code for HW operations on GEM 
buffers.

> Drivers which were created in the DRI2 era where the only metadata
> transited was handle/pitch/format, worked around that by adding
> auxiliary data hanging off the buffer to describe their actual layout,
> but this is a mistake we're trying to move away from. So I reflexively
> get a bit itchy when I see the kernel being used to transit magic
> blobs of data which are supplied by userspace, and only interpreted by
> different userspace. Having tiling formats hidden away means that
> we've had real-world bugs in AMD hardware, where we end up displaying
> garbage because we cannot generically reason about the buffer
> attributes.

Agreed.  For those not clear, and to verify my own understanding, the 
above paragraph is basically an argument similar to my own above against 
using Miguel's option (A), correct?

> As with Kristian, I'd also like to hear any examples of metadata which
> wouldn't fit inside 56 bits. A quick finger count says that if you
> have 128 different possibilities for all of: tiling layout, micro-tile
> size, macro-tile size, supermacro-tile size, swizzling/addressing
> mode, and compression, this only uses 42 of the 56 bytes available to
> you, still leaving two free 128-value axes. Is your concern about the
> lack of space along these axes I've identified, or that you need more
> axes, or ... ?

Your worst case analysis above isn't far off from our HW, give or take 
some bits and axes here and there.  We've started an internal discussion 
about how to lay out all the bits we need.  It's hard to even enumerate 
them all without having a complete understanding of what capability sets 
are going to include, a fully-optimized implementation of the mechanism 
on our HW, and lot's of test scenarios though.

However, making some assumptions, I suspect it's probably going to come 
down to yes we can fit what we need in some number of bits marginally 
less than 56 now, with the current use cases and hardware, but we're 
very concerned about extensibility given the number has only ever grown 
in our HW, is uncomfortably close to the limit if it isn't over it 
already, and it's been demonstrated it takes a monumental effort to 
change the mechanism if it isn't extensible.  While it's hard to change 
the mechanism one more time now, better to change it to something truly 
extensible now because it will be much, much harder to make such a 
change ~5 years from now in a world where it's baked in to pervasively 
deployed Wayland and X protocol, the EGL and Vulkan extensions have been 
defined for a few years and in use by apps besides Wayland, and the 
allocator stuff is deployed on ~5 operating systems that have some 
derivative version of DRM modifiers to support it and a bunch of funky 
embedded apps using it.  Further, we're volunteering to handle the bulk 
of the effort needed to make the change now, so I hope architectural 
correctness and maintainability can be the primary points of debate.

Thanks,
-James

> Cheers,
> Daniel
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>