[Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

Daniel Vetter daniel at ffwll.ch
Wed Mar 7 17:23:39 UTC 2018


On Thu, Feb 22, 2018 at 04:16:52PM -0500, Alex Deucher wrote:
> On Thu, Feb 22, 2018 at 1:49 PM, Bas Nieuwenhuizen
> <bas at basnieuwenhuizen.nl> wrote:
> > On Thu, Feb 22, 2018 at 7:04 PM, Kristian H??gsberg <hoegsberg at gmail.com> wrote:
> >> On Wed, Feb 21, 2018 at 4:00 PM Alex Deucher <alexdeucher at gmail.com> wrote:
> >>
> >>> On Wed, Feb 21, 2018 at 1:14 AM, Chad Versace <chadversary at chromium.org>
> >> wrote:
> >>> > On Thu 21 Dec 2017, Daniel Vetter wrote:
> >>> >> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen <
> >> hoegsberg at google.com> wrote:
> >>> >>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico <
> >> mvicomoya at nvidia.com> wrote:
> >>> >>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian H??gsberg <
> >> hoegsberg at gmail.com> wrote:
> >>> >>>>> I'd like to see concrete examples of actual display controllers
> >>> >>>>> supporting more format layouts than what can be specified with a 64
> >>> >>>>> bit modifier.
> >>> >>>>
> >>> >>>> The main problem is our tiling and other metadata parameters can't
> >>> >>>> generally fit in a modifier, so we find passing a blob of metadata a
> >>> >>>> more suitable mechanism.
> >>> >>>
> >>> >>> I understand that you may have n knobs with a total of more than a
> >> total of
> >>> >>> 56 bits that configure your tiling/swizzling for color buffers. What
> >> I don't
> >>> >>> buy is that you need all those combinations when passing buffers
> >> around
> >>> >>> between codecs, cameras and display controllers. Even if you're
> >> sharing
> >>> >>> between the same 3D drivers in different processes, I expect just
> >> locking
> >>> >>> down, say, 64 different combinations (you can add more over time) and
> >>> >>> assigning each a modifier would be sufficient. I doubt you'd extract
> >>> >>> meaningful performance gains from going all the way to a blob.
> >>> >
> >>> > I agree with Kristian above. In my opinion, choosing to encode in
> >>> > modifiers a precise description of every possible tiling/compression
> >>> > layout is not technically incorrect, but I believe it misses the point.
> >>> > The intention behind modifiers is not to exhaustively describe all
> >>> > possibilites.
> >>> >
> >>> > I summarized this opinion in VK_EXT_image_drm_format_modifier,
> >>> > where I wrote an "introdution to modifiers" section. Here's an excerpt:
> >>> >
> >>> >     One goal of modifiers in the Linux ecosystem is to enumerate for
> >> each
> >>> >     vendor a reasonably sized set of tiling formats that are
> >> appropriate for
> >>> >     images shared across processes, APIs, and/or devices, where each
> >>> >     participating component may possibly be from different vendors.
> >>> >     A non-goal is to enumerate all tiling formats supported by all
> >> vendors.
> >>> >     Some tiling formats used internally by vendors are inappropriate for
> >>> >     sharing; no modifiers should be assigned to such tiling formats.
> >>
> >>> Where it gets tricky is how to select that subset?  Our tiling mode
> >>> are defined more by the asic specific constraints than the tiling mode
> >>> itself.  At a high level we have basically 3 tiling modes (out of 16
> >>> possible) that would be the minimum we'd want to expose for gfx6-8.
> >>> gfx9 uses a completely new scheme.
> >>> 1. Linear (per asic stride requirements, not usable by many hw blocks)
> >>> 2. 1D Thin (5 layouts, displayable, depth, thin, rotated, thick)
> >>> 3. 2D Thin (1D tiling constraints, plus pipe config (18 possible),
> >>> tile split (7 possible), sample split (4 possible), num banks (4
> >>> possible), bank width (4 possible), bank height (4 possible), macro
> >>> tile aspect (4 possible) all of which are asic config specific)
> >>
> >>> I guess we could do something like:
> >>> AMD_GFX6_LINEAR_ALIGNED_64B
> >>> AMD_GFX6_LINEAR_ALIGNED_256B
> >>> AMD_GFX6_LINEAR_ALIGNED_512B
> >>> AMD_GFX6_1D_THIN_DISPLAY
> >>> AMD_GFX6_1D_THIN_DEPTH
> >>> AMD_GFX6_1D_THIN_ROTATED
> >>> AMD_GFX6_1D_THIN_THIN
> >>> AMD_GFX6_1D_THIN_THICK
> >>
> >> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P2_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_DISPLAY_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_DEPTH_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_ROTATED_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_THIN_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>
> >> AMD_GFX6_2D_1D_THIN_THICK_PIPE_CONFIG_P4_8x16_TILE_SPLIT_64B_SAMPLE_SPLIT_1_NUM_BANKS_2_BANK_WIDTH_1_BANK_HEIGHT_1_MACRO_TILE_ASPECT_1
> >>> etc.
> >>
> >>> We only probably need 40 bits to encode all of the tiling parameters
> >>> so we could do family, plus tiling encoding that still seems unwieldy
> >>> to deal with from an application perspective.  All of the parameters
> >>> affect the alignment requirements.
> >>
> >> We discussed this earlier in the thread, here's what I said:
> >>
> >> Another point here is that the modifier doesn't need to encode all the
> >> thing you have to communicate to the HW. For a given width, height, format,
> >> compression type and maybe a few other high-level parameters, I'm skeptical
> >> that the remaining tile parameters aren't just mechanically derivable using
> >> a fixed table or formula. So instead of thinking of the modifiers as
> >> something you can just memcpy into a state packet, it identifies a family
> >> of configurations - enough information to deterministically derive the full
> >> exact configuration. The formula may change, for example for different
> >> hardware or if it's determined to not be optimal, and in that case, we can
> >> use a new modifier to represent to new formula.
> >
> > I think this is not so much about being able to dump it in a state
> > packet, but about sharing between different GPUs of AMD. We have
> > basically only a few interesting tiling modes if you look at a single
> > GPU, but checking if those are equal depends on the other bits  which
> > may or may not be different per chip for the same conceptual tiling
> > mode. We could just put a chip identifier in, but that would preclude
> > any sharing while I think we can do some.
> 
> Right.  And the 2D ones, while they are the most complicated, are also
> the most interesting from a performance perspective so ideally you'd
> find a match on one of those.  If you don't expose the 2D modes,
> there's not much point in supporting modifiers at all.

1. Make sure you have a test farm covering all your use cases and hw.

2. Create a struct that encodes everything. Make it a few kb big if it has
to be, whatever it takes.

3. Do a little library that contains a huge table mapping modifiers to
these structs, and one function that returns you the unique modifier for
the given tiling layout description struct. We can have that in the kernel
sources, or just delegate the entire AMD modifier block to some userspace
library you're managing (with just the few modifiers the kernel needs in
the uapi/drm_fourcc.h header). If the lib doesn't find the modifier, make
it crash with a nice loud backtrace.

4. Add modifiers to that lib until you stop failing on the test farm.

5 optional: Make the lib faster with hashing/compressing/whatever if it
turns out to be a bottleneck somewhere. Since you'll only ever need it on
import/export, add a small cache with the relevant few entries for the
device instance at hand and I don't expect this will be a problem, ever.

I'm pretty sure you'll finish step 4 before you run out of modifiers. If
you don't, then we suck it up, admit sheepishly that modifiers turned out
to be a stupid idea and rev the kernel's uapi. We know how to do that, but
I also don't want to rev uapi just for fun.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


More information about the mesa-dev mailing list