[Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

Tue Feb 27 06:10:36 UTC 2018

On 02/22/2018 01:16 PM, Alex Deucher wrote:
> On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen
> <bas at basnieuwenhuizen.nl> wrote:
>> On Thu, Feb 22, 2018 at 7:04 PM, Kristian Høgsberg <hoegsberg at gmail.com> wrote:
>>> On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher <alexdeucher at gmail.com> wrote:
>>>
>>>> On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace <chadversary at chromium.org>
>>> wrote:
>>>>> On Thu 21 Dec 2017, Daniel Vetter wrote:
>>>>>> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
>>> hoegsberg at google.com> wrote:
>>>>>>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
>>> mvicomoya at nvidia.com> wrote:
>>>>>>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg <
>>> hoegsberg at gmail.com> wrote:
>>>>>>>>> I'd like to see concrete examples of actual display controllers
>>>>>>>>> supporting more format layouts than what can be specified with a 64
>>>>>>>>> bit modifier.
>>>>>>>>
>>>>>>>> The main problem is our tiling and other metadata parameters can't
>>>>>>>> generally fit in a modifier, so we find passing a blob of metadata a
>>>>>>>> more suitable mechanism.
>>>>>>>
>>>>>>> I understand that you may have n knobs with a total of more than a
>>> total of
>>>>>>> 56 bits that configure your tiling/swizzling for color buffers. What
>>> I don't
>>>>>>> buy is that you need all those combinations when passing buffers
>>> around
>>>>>>> between codecs, cameras and display controllers. Even if you're
>>> sharing
>>>>>>> between the same 3D drivers in different processes, I expect just
>>> locking
>>>>>>> down, say, 64 different combinations (you can add more over time) and
>>>>>>> assigning each a modifier would be sufficient. I doubt you'd extract
>>>>>>> meaningful performance gains from going all the way to a blob.
>>>>>
>>>>> I agree with Kristian above. In my opinion, choosing to encode in
>>>>> modifiers a precise description of every possible tiling/compression
>>>>> layout is not technically incorrect, but I believe it misses the point.
>>>>> The intention behind modifiers is not to exhaustively describe all
>>>>> possibilites.
>>>>>
>>>>> I summarized this opinion in VK_EXT_image_drm_format_modifier,
>>>>> where I wrote an "introdution to modifiers" section. Here's an excerpt:
>>>>>
>>>>>      One goal of modifiers in the Linux ecosystem is to enumerate for
>>> each
>>>>>      vendor a reasonably sized set of tiling formats that are
>>> appropriate for
>>>>>      images shared across processes, APIs, and/or devices, where each
>>>>>      participating component may possibly be from different vendors.
>>>>>      A non-goal is to enumerate all tiling formats supported by all
>>> vendors.
>>>>>      Some tiling formats used internally by vendors are inappropriate for
>>>>>      sharing; no modifiers should be assigned to such tiling formats.
>>>
>>>> Where it gets tricky is how to select that subset?  Our tiling mode
>>>> are defined more by the asic specific constraints than the tiling mode
>>>> itself.  At a high level we have basically 3 tiling modes (out of 16
>>>> possible) that would be the minimum we'd want to expose for gfx6-8.
>>>> gfx9 uses a completely new scheme.
>>>> 1. Linear (per asic stride requirements, not usable by many hw blocks)
>>>> 2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
>>>> 3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
>>>> tile split (7 possible), sample split (4 possible), num banks (4
>>>> possible), bank width (4 possible), bank height (4 possible), macro
>>>> tile aspect (4 possible) all of which are asic config specific)
>>>
>>>> I guess we could do something like:
>>>> AMD_GFX6_LINEAR_ALIGNED_64B
>>>> AMD_GFX6_LINEAR_ALIGNED_256B
>>>> AMD_GFX6_LINEAR_ALIGNED_512B
>>>> AMD_GFX6_1D_THIN_DISPLAY
>>>> AMD_GFX6_1D_THIN_DEPTH
>>>> AMD_GFX6_1D_THIN_ROTATED
>>>> AMD_GFX6_1D_THIN_THIN
>>>> AMD_GFX6_1D_THIN_THICK
>>>
>>> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>
>>> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
>>>> etc.
>>>
>>>> We only probably need 40 bits to encode all of the tiling parameters
>>>> so we could do family, plus tiling encoding that still seems unwieldy
>>>> to deal with from an application perspective.  All of the parameters
>>>> affect the alignment requirements.
>>>
>>> We discussed this earlier in the thread, here's what I said:
>>>
>>> Another point here is that the modifier doesn't need to encode all the
>>> thing you have to communicate to the HW. For a given width, height, format,
>>> compression type and maybe a few other high-level parameters, I'm skeptical
>>> that the remaining tile parameters aren't just mechanically derivable using
>>> a fixed table or formula. So instead of thinking of the modifiers as
>>> something you can just memcpy into a state packet, it identifies a family
>>> of configurations - enough information to deterministically derive the full
>>> exact configuration. The formula may change, for example for different
>>> hardware or if it's determined to not be optimal, and in that case, we can
>>> use a new modifier to represent to new formula.
>>
>> I think this is not so much about being able to dump it in a state
>> packet, but about sharing between different GPUs of AMD. We have
>> basically only a few interesting tiling modes if you look at a single
>> GPU, but checking if those are equal depends on the other bits  which
>> may or may not be different per chip for the same conceptual tiling
>> mode. We could just put a chip identifier in, but that would preclude
>> any sharing while I think we can do some.
> 
> Right.  And the 2D ones, while they are the most complicated, are also
> the most interesting from a performance perspective so ideally you'd
> find a match on one of those.  If you don't expose the 2D modes,
> there's not much point in supporting modifiers at all.

This is essentially the problem I keep running into when trying to work 
up something based on the suggestions here as well.  Yes, for a given 
build of our driver on a single device, we can re-derive exactly the 
same tiling parameters given a few manageable constraints.  That was the 
essence of the design of the Vulkan external objects framework, and it 
comes with all the limitations I'm trying to avoid by introducing the 
more complex allocator framework:

-We want to share across GPUs.

-We potentially want to share across non-version-locked driver 
components, even potentially between Nouveau-driven/Tegra-DRM driven 
GPUs and NVIDIA proprietary driven GPUs.  There's no way we can assure 
the drivers use the same algorithm there.

Taking it further than even I would like to, in a discussion over DRM 
format modifier usage in Vulkan, it was recently proposed that DRM 
format modifiers be used to serialize data in a pre-tiled format.  I 
personally don't think DRM format modifiers should be used for this at 
all, but something like extended allocator meta-data might be appropriate.

At this point I've heard engineers from Intel, AMD, and of course myself 
at NVIDIA saying that while DRM format modifiers solve many more cases 
than assuming pitch-linear or doing magic to pass around metadata, they 
don't solve all the cases necessary to make optimal use of any of our HW 
in at least some interesting cases.  Hence it seems reasonable to 
continue to improve the design of these mechanisms.

Responding to some earlier points that fell off my mail retention limit 
while I was on paternity leave:

> I understand that it's an incomplete example, but even so I don't think
> this duplication is feasible. It's not a matter of how many use cases we
> have to duplicate at this point in time, it's that all these APIs are live,
> evolving APIs and keeping the allocator uptodate as various APIs grow new
> corner cases doesn't seem practical. Further, it's not orthogonal or
> composable - the allocator has to know about all producers and consumers
> and if I add a new piece of hardware I have to extend the allocator to
> understands its new use cases. With the modifier model, I just ask the new
> driver which modifiers it supports for the use case I'm interested in and
> feed those modifiers to the allocator.

There are currently 3 complete modern low-level 3D graphics APIs along 
with some slightly longer in the tooth higher-level alternatives being 
actively maintained at more or less the same feature level, countless 
video decode/encode APIs with more or less equivalent functionality, and 
more mode setting APIs than anyone wants.  If that much total duplicated 
effort is possible, it seems feasible to maintain a list of layouts and 
related properties, most of which will see some re-use between all these 
APIs.

Further, the central library doesn't need to be burdened by all of these 
use cases unless they become cross-vendor.  The usage itself is 
vendor-extensible, so if AMD had wanted to add a bunch of Mantle-only 
usage bits, they could have done so without cluttering the shared 
library code or namespace.

> Vulkan isn't expected to know about video encode usage. You ask the video
> codec about supported modifiers for encode and you ask Vulkan for supported
> modifiers for, say optimal render usage. The allocator determines the
> optimal lowest common denominator and allocates the buffer. Maybe that's
> linear, or if you've designed both parts, maybe there's a simple shared
> tiled format that the encoder can source from.

It was determined early on in attempts to design this mechanism that 
such LCD intersection doesn't produce the optimal result.  Only 
considering the usage holistically can produce optimal layouts.

> For modifiers and liballocator as well, the meta data is copied by value
> (and passed through IPC) and as such can't model shared mutable
> information. That means, fast colors, compression aux buffers and such, has
> to be in a share BO plane.

Again, this is making large design assumptions.  Fast clear color data, 
for example would be a very reasonable thing to include in static 
metadata given our driver+HW architecture.

Thanks,
-James

> Alex
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>