[Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

Wed Dec 20 21:58:51 UTC 2017

Hi Miguel,

On 20 December 2017 at 16:51, Miguel Angel Vico <mvicomoya at nvidia.com> wrote:
> In the meantime, I've been working on putting together an open source
> implementation of the allocator mechanisms using the Nouveau driver for
> all to be able to play with.

Thanks for taking a look at this! I'm still winding out my to-do list
for the year, but hoping to get to this more seriously soon.

As a general comment, now that modifiers are a first-class concept in
many places (KMS FBs, KMS plane format advertisement, V4L2 buffers,
EGL/Vulkan image import/export, Wayland buffer import, etc), I'd like
to see them included as a first-class concept in the allocator. I
understand one of the primary reservations against using them was that
QNX didn't have such a concept, but just specifying them to be ignored
on non-Linux platforms would probably work fine.

> Another of the missing pieces before we can move this to production is
> importing allocations to DRM FB objects. This is probably one of the
> most sensitive parts of the project as it requires modification/addition
> of kernel driver interfaces.
>
> At XDC2017, James had several hallway conversations with several people
> about this, all having different opinions. I'd like to take this
> opportunity to also start a discussion about what's the best option to
> create a path to get allocator allocations added as DRM FB objects.
>
> These are the few options we've considered to start with:
>
>   A) Have vendor-private ioctls to set properties on GEM objects that
>      are inherited by the FB objects. This is how our (NVIDIA) desktop
>      DRM driver currently works. This would require every vendor to add
>      their own ioctl to process allocator metadata, but the metadata is
>      actually a vendor-agnostic object more like DRM modifiers. We'd
>      like to come up with a vendor-agnostic solutions that can be
>      integrated to core DRM.

This worries me. If the data is static for the lifetime of the buffer
- describing the tiling layout, for instance - then it would form
effective ABI for all the consumers/producers using that buffer type.
If it is dynamic, you also have a world of synchronisation problems
when multiple users race each other with different uses of that buffer
(and presumably you would need to reload the metadata on every use?).
Either way, anyone using this would need to have a very well-developed
compatibility story, given that you can mix and match kernel and
userspace versions.

>   B) Add a new drmModeAddFBWithMetadata() command that takes allocator
>      metadata blobs for each plane of the FB. Some people in the
>      community have mentioned this is their preferred design. This,
>      however, means we'd have to go through the exercise of adding
>      another metadata mechanism to the whole graphics stack.

Similarly, this seems to be missing either a 'mandatory' flag so
userspace can inform the kernel it must fail if it does not understand
certain capabilities, or a way for the kernel to inform userspace
which capabilities it does/doesn't understand.

The capabilities in the example are also very oddly chosen. Address
alignment, pitch alignment, and maximum pitch are superfluous: the KMS
driver is the single source of truth for these values for FBs, so it
isn't useful for userspace to provide it. Specifically for pitch
alignment and maximum pitch, the pitch values are already given in the
same ioctl, so all you can check with these values (before the driver
does its own check again) is that userspace is self-consistent. These
three capabilities all relate to BO allocation rather than FB
creation: if a BO is rendered into with the wrong pitch, or allocated
at the wrong base address, we've already lost because the allocation
was incorrect.

Did you have some other capabilities in mind which would be more
relevant to FBs?

>   C) Shove allocator metadata into DRM by defining it to be a separate
>      plane in the image, and using the existing DRM modifiers mechanism
>      to indicate there is another plane for each "real" plane added. It
>      isn't clear how this scales to surfaces that already need several
>      planes, but there are some people that see this as the only way
>      forward. Also, we would have to create a separate GEM buffer for
>      the metadatada itself, which seems excessive.

I also have my reservations about this one. The general idea behind
FBs is that, if the buffers are identical but for memory addresses and
pixel content, the parameters should be equal but the per-plane buffer
contents different. Conversely, if the buffers differ in any way but
the above, the parameters should be different. For instance, if
buffers have identical layouts (tiling/swizzling/compression),
identical pixel content once interpreted, but the only thing which
differs is the compression status (fully resolved / not resolved), I
would expect to see identical parameters and differing data in the
auxiliary compression plane. We had quite a long discussion for
framebuffer compression on Intel where we shot down the concept of
expressing compression status as a plane property for basically this
reason.

Drivers which were created in the DRI2 era where the only metadata
transited was handle/pitch/format, worked around that by adding
auxiliary data hanging off the buffer to describe their actual layout,
but this is a mistake we're trying to move away from. So I reflexively
get a bit itchy when I see the kernel being used to transit magic
blobs of data which are supplied by userspace, and only interpreted by
different userspace. Having tiling formats hidden away means that
we've had real-world bugs in AMD hardware, where we end up displaying
garbage because we cannot generically reason about the buffer
attributes.

As with Kristian, I'd also like to hear any examples of metadata which
wouldn't fit inside 56 bits. A quick finger count says that if you
have 128 different possibilities for all of: tiling layout, micro-tile
size, macro-tile size, supermacro-tile size, swizzling/addressing
mode, and compression, this only uses 42 of the 56 bytes available to
you, still leaving two free 128-value axes. Is your concern about the
lack of space along these axes I've identified, or that you need more
axes, or ... ?

Cheers,
Daniel