[Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

Wed Jan 3 14:53:06 UTC 2018

On Thu, Dec 28, 2017 at 1:24 PM, Miguel Angel Vico <mvicomoya at nvidia.com> wrote:
> (Adding dri-devel back, and trying to respond to some comments from
> the different forks)
>
> James Jones wrote:
>
>> Your worst case analysis above isn't far off from our HW, give or take
>> some bits and axes here and there.  We've started an internal discussion
>> about how to lay out all the bits we need.  It's hard to even enumerate
>> them all without having a complete understanding of what capability sets
>> are going to include, a fully-optimized implementation of the mechanism
>> on our HW, and lot's of test scenarios though.
>
> (thanks James for most of the info below)
>
> To elaborate a bit, if we want to share an allocation across GPUs for 3D
> rendering, it seems we would need 12 bits to express our
> swizzling/tiling memory layouts for fermi+. In addition to that,
> maxwell uses 3 more bits for this, and we need an extra bit to identify
> pre-fermi representations.
>
> We also need one bit to differentiate between Tegra and desktop, and
> another one to indicate whether the layout is otherwise linear.
>
> Then things like whether compression is used (one more bit), and we can
> probably get by with 3 bits for the type of compression if we are
> creative. However, it'd be way easier to just track arch + page kind,
> which would be like 32 bits on its own.
>
> Whether Z-culling and/or zero-bandwidth-clears are used may be another 3
> bits.
>
> If device-local properties are included, we might need a couple more
> bits for caching.
>
> We may also need to express locality information, which may take at
> least another 2 or 3 bits.
>
> If we want to share array textures too, you also need to pass the array
> pitch. Is it supposed to be encoded in a modifier too? That's 64 bits on
> its own.
>
> So yes, as James mentioned, with some effort, we could technically fit
> our current allocation parameters in a modifier, but I'm still not
> convinced this is as future proof as it could be as our hardware grows
> in capabilities.
>
>
> Daniel Stone wrote:
>
>> So I reflexively
>> get a bit itchy when I see the kernel being used to transit magic
>> blobs of data which are supplied by userspace, and only interpreted by
>> different userspace. Having tiling formats hidden away means that
>> we've had real-world bugs in AMD hardware, where we end up displaying
>> garbage because we cannot generically reason about the buffer
>> attributes.
>
> I'm a bit confused. Can't modifiers be specified by vendors and only
> interpreted by drivers? My understanding was that modifiers could
> actually be treated as opaque 64-bit data, in which case they would
> qualify as "magic blobs of data". Otherwise, it seems this wouldn't be
> scalable. What am I missing?
>
>
> Daniel Vetter wrote:
>
>> I think in the interim figuring out how to expose kms capabilities
>> better (and necessarily standardizing at least some of them which
>> matter at the compositor level, like size limits of framebuffers)
>> feels like the place to push the ecosystem forward. In some way
>> Miguel's proposal looks a bit backwards, since it adds the pitch
>> capabilities to addfb, but at addfb time you've allocated everything
>> already, so way too late to fix things up. With modifiers we've added
>> a very simple per-plane property to list which modifiers can be
>> combined with which pixel formats. Tiny start, but obviously very far
>> from all that we'll need.
>
> Not sure whether I might be misunderstanding your statement, but one of
> the allocator main features is negotiation of nearly optimal allocation
> parameters given a set of uses on different devices/engines by the
> capability merge operation. A client should have queried what every
> device/engine is capable of for the given uses, find the optimal set of
> capabilities, and use it for allocating a buffer. At the moment these
> parameters are given to KMS, they are expected to be good. If they
> aren't, the client didn't do things right.
>
>
> Rob Clark wrote:
>
>> It does seem like, if possible, starting out with modifiers for now at
>> the kernel interface would make life easier, vs trying to reinvent
>> both kernel and userspace APIs at the same time.  Userspace APIs are
>> easier to change or throw away.  Presumably by the time we get to the
>> point of changing kernel uabi, we are already using, and pretty happy
>> with, serialized liballoc data over the wire in userspace so it is
>> only a matter of changing the kernel interface.
>
> I guess we can indeed start with modifiers for now, if that's what it
> takes to get the allocator mechanisms rolling. However, it seems to me
> that we won't be able to encode the same type of information included
> in capability sets with modifiers in all cases. For instance, if we end
> up encoding usage transition information in capability sets, how that
> would translate to modifiers?
>
> I assume display doesn't really care about a lot of the data capability
> sets may encode, but is it correct to think of modifiers as things only
> display needs? If we are to treat modifiers as a first-class citizen, I
> would expect to use them beyond that.
>

btw, the places where modifiers are used currently is limited to 2d
textures, without mipmap levels.  Basically scanout buffers, winsys
buffers, decoded frames of video, and that sort of thing.  I think we
can keep it that way, which avoids needing to encode additional info
(layer pitch, z tiling info for 3d textures, or whatever else).

So we just need to have something in userspace that translates the
relevant subset of capability set info to modifiers.

Maybe down the road, if capability sets are ubiquitous we can
"promote" that mechanism to kernel uabi.. although tbh I am not
entirely sure I can envision a use-case where kernel needs to know
about a cubemap array texture.

BR,
-R

>
> Kristian Kristensen wrote:
>
>> I agree and let me elaborate a bit. The problem we're seeing isn't that we
>> need more that 2^56 modifiers for a future GPU. The problem is that flags
>> like USE_SCANOUT (which your allocator proposal essentially keeps) is
>> inadequate. The available tiling and compression formats vary with which
>> (in KMS terms) CRTC you want to use, which plane you're on whether you want
>> rotation or no and how much you want to scale etc. It's not realistic to
>> think that we could model this in a centralized allocator library that's
>> detached from the display driver. To be fair, this is not a point about
>> blobs vs modifiers, it's saying that the use flags don't belong in the
>> allocator, they belong in the APIs that will be using the buffer - and not
>> as literal use flags, but as a way to discover supported modifiers for a
>> given use case.
>
> Why detached from the display driver? I don't see why there couldn't be
> an allocator driver with access to display capabilities that can be
> used in the negotiation step to find the optimal set of allocation
> parameters.
>
>
> Kristian Kristensen wrote:
>
>> I understand that you may have n knobs with a total of more than a total of
>> 56 bits that configure your tiling/swizzling for color buffers. What I don't
>> buy is that you need all those combinations when passing buffers around
>> between codecs, cameras and display controllers. Even if you're sharing
>> between the same 3D drivers in different processes, I expect just locking
>> down, say, 64 different combinations (you can add more over time) and
>> assigning each a modifier would be sufficient. I doubt you'd extract
>> meaningful performance gains from going all the way to a blob.
>
> If someone has N knobs available, I don't understand why there
> shouldn't be a mechanism that allows making use of them all, regardless
> of performance numbers.
>
>
> Daniel Vetter wrote:
>
>> Yeah, that part was all clear. I'd want more details of what exact
>> kind of metadata. fast-clear colors? tiling layouts? aux data for the
>> compressor? hiz (or whatever you folks call it) tree?
>>
>> As you say, we've discussed massive amounts of different variants on
>> this, and there's different answers for different questions. Consensus
>> seems to be that bigger stuff (compression data, hiz, clear colors,
>> ...) should be stored in aux planes, while the exact layout and what
>> kind of aux planes you have are encoded in the modifier.
>
> My understanding is that capability sets may include all metadata you
> mentioned. Besides tiling/swizzling layout and compression parameters,
> things like zero-bandwidth-clears (I guess the same or similar to
> fast-clear colors?), hiz-like data, device-local properties such as
> caches, or locality information could/will be also included in a
> capability set. We are even considering encoding some sort of usage
> transition information in the capability set itself.
>
>
> Thanks,
> Miguel.