[PATCH] drm: Generalized NV Block Linear DRM format mod

Wed Oct 16 17:24:19 UTC 2019

On 10/15/19 8:42 AM, Daniel Vetter wrote:
> On Tue, Oct 15, 2019 at 5:14 PM James Jones <jajones at nvidia.com> wrote:
>>
>> On 10/15/19 7:19 AM, Daniel Vetter wrote:
>>> On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:
>>>> Builds upon the existing NVIDIA 16Bx2 block linear
>>>> format modifiers by adding more "fields" to the
>>>> existing parameterized
>>>> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
>>>> macro that allow fully defining a unique-across-
>>>> all-NVIDIA-hardware bit layout using a minimal
>>>> set of fields and values.  The new modifier macro
>>>> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
>>>> effectively backwards compatible with the existing
>>>> macro, introducing a superset of the previously
>>>> definable format modifiers.
>>>>
>>>> Backwards compatibility has two quirks.  First,
>>>> the zero value for the "kind" field, which is
>>>> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
>>>> macro, must be special cased in drivers and
>>>> assumed to map to the pre-Turing generic kind of
>>>> 0xfe, since a kind of "zero" is reserved for
>>>> linear buffer layouts on all GPUs.
>>>>
>>>> Second, it is assumed backwards compatibility
>>>> is only needed when running on Tegra GPUs, and
>>>> specifically Tegra GPUs prior to Xavier.  This
>>>> is based on two assertions:
>>>>
>>>> -Tegra GPUs prior to Xavier used a slightly
>>>>    different raw bit layout than desktop GPUs,
>>>>    making it impossible to directly share block
>>>>    linear buffers between the two.
>>>>
>>>> -Support for the existing block linear modifiers
>>>>    was incomplete, making them useful only for
>>>>    exporting buffers created by nouveau and
>>>>    importing them to Tegra DRM as framebuffers for
>>>>    scan out.  There was no support for adding
>>>>    framebuffers using format modifiers in nouveau,
>>>>    nor importing dma-buf/PRIME GEM objects into
>>>>    nouveau userspace drivers with modifiers in Mesa.
>>>>
>>>> Hence it is assumed the prior modifiers were not
>>>> intended for use on desktop GPUs, and as a
>>>> corrolary, were not intended to support sharing
>>>> block linear buffers across two different NVIDIA
>>>> GPUs.
>>>>
>>>> Signed-off-by: James Jones <jajones at nvidia.com>
>>>> ---
>>>>    include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++---
>>>>    1 file changed, 100 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>>>> index 3feeaa3f987a..cc9853d42a24 100644
>>>> --- a/include/uapi/drm/drm_fourcc.h
>>>> +++ b/include/uapi/drm/drm_fourcc.h
>>>> @@ -497,7 +497,99 @@ extern "C" {
>>>>    #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
>>>>
>>>>    /*
>>>> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
>>>> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80,
>>>> + * and Tegra GPUs starting with Tegra K1.
>>>> + *
>>>> + * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
>>>> + * based on the architecture generation.  GOBs themselves are then arranged in
>>>> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
>>>> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
>>>> + * a block depth or height of "4").
>>>> + *
>>>> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
>>>> + * in full detail.
>>>> + *
>>>> + *       Macro
>>>> + * Bits  Param Description
>>>> + * ----  ----- -----------------------------------------------------------------
>>>> + *
>>>> + *  3:0  h     log2(height) of each block, in GOBs.  Placed here for
>>>> + *             compatibility with the existing
>>>> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
>>>> + *
>>>> + *  4:4  -     Must be 1, to indicate block-linear layout.  Necessary for
>>>> + *             compatibility with the existing
>>>> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
>>>> + *
>>>> + *  8:5  -     Reserved (To support 3D-surfaces with variable log2(depth) block
>>>> + *             size).  Must be zero.
>>>> + *
>>>> + *             Note there is no log2(width) parameter.  Some portions of the
>>>> + *             hardware support a block width of two gobs, but it is impractical
>>>> + *             to use due to lack of support elsewhere, and has no known
>>>> + *             benefits.
>>>> + *
>>>> + * 11:9  -     Reserved (To support 2D-array textures with variable array stride
>>>> + *             in blocks, specified via log2(tile width in blocks)).  Must be
>>>> + *             zero.
>>>> + *
>>>> + * 19:12 k     Page Kind.  This value directly maps to a field in the page
>>>> + *             tables of all GPUs >= NV50.  It affects the exact layout of bits
>>>> + *             in memory and can be derived from the tuple
>>>> + *
>>>> + *               (format, GPU model, compression type, samples per pixel)
>>>> + *
>>>> + *             Where compression type is defined below.  If GPU model were
>>>> + *             implied by the format modifier, format, or memory buffer, page
>>>> + *             kind would not need to be included in the modifier itself, but
>>>> + *             since the modifier should define the layout of the associated
>>>> + *             memory buffer independent from any device or other context, it
>>>> + *             must be included here.
>>>> + *
>>>> + *             To grandfather in prior block linear format modifiers to this
>>>> + *             layout, the page kind "0", which corresponds to "pitch/linear"
>>>> + *             and hence is unusable with block-linear layouts, is remapped
>>>> + *             within drivers to the value 0xfe, which corresponds to the
>>>> + *             "generic" kind used for simple single-sample color formats on
>>>> + *             pre-Turing GPUs.
>>>
>>> Hm, maybe a tiny static inline function which canonizalizes modifiers?
>>> Something like
>>>
>>> static inline u64
>>> drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
>>> is_pre_turing)
>>> {
>>> }
>>>
>>> Would then give you a nice place to stick this backward compat note and
>>> make it really clear what should be done. I think establishing this as a
>>> pattern would also be nice, since I'm sure we'll have a pile more of these
>>> cases where modifiers turn out to assume a few too many things about the
>>> platform they're used on (we have a similar case on the intel side too).
>>
>> To make sure I'm clear, it would behave like this?
>>
>>     fixed_mod = canonicalize(old_style_valid_mod, true);
>>     assert(fixed_mod == old_style_valid_mod | (0xfe << 12));
>>     fixed_mod = canonicalize(new_style_valid_mod, [false,true]);
>>     assert(fixed_mod == new_style_valid_mod);
>>
>> I'm unclear what it should do in this case though:
>>
>>     fixed_mod = canonicalize(old_style_valid_mod, false);
>>
>> Since there's no code out there using the old style modifiers with
>> Turing+ yet, and I don't want to try to support such usage.  Maybe just
>> drop the "is_pre_turing" parameter and always canonicalize by mapping 0
>> -> 0xfe as the comment above states, and not touching other values?  Or
>> should any invalid modifier, including this case, return
>> DRM_FORMAT_MOD_INVALID?  This latter idea seems risky because it would
>> cause software compiled against old drm_fourcc.h to potentially reject
>> format modifiers from newer kernels or libraries with an expanded
>> representation, but maybe that's what we want if only driver components
>> are supposed to call this function.
> 
> That's where I'm showing that I have no clue about nvidia buffer
> formats I guess :-)
> 
> If 0 never makes sense, then yeah I guess you could just
> unconditionally canonicalize. Maybe you want to reject the old/legacy
> style on turing+ plus, but that's a tradeoff up to you guys - as long
> as it's consistent across all involved drivers. Not rejecting it would
> essentially make 0 an alias for 0xfe everywhere, post and pre-turing.
> More reasons for having a shared canonicalize function which enforces
> the same behaviour (whichever you pick) everywhere I think.
> 
> And yeah the canonicalize function would only be called by drivers,
> not by clients.
> -Daniel

I wrote up a variant of this that canonicalizes any NV format mod to 
make it easier to use, and rebased my patches on top of it.  I only 
ended up with one caller, but I do like the design.  Sent out as v2.

Thanks,
-James

> 
>> Thanks,
>> -James
>>
>>> Just a drive-by idea, feel free to ignore.
>>>
>>> Cheers, Daniel
>>>
>>>> + *
>>>> + * 21:20 g     GOB Height and Page Kind Generation.  The height of a GOB changed
>>>> + *             starting with Fermi GPUs.  Additionally, the mapping between page
>>>> + *             kind and bit layout has changed at various points.
>>>> + *
>>>> + *               0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping
>>>> + *               1 = Gob Height 4, G80 - GT2XX Page Kind mapping
>>>> + *               2 = Gob Height 8, Turing+ Page Kind mapping
>>>> + *               3 = Reserved for future use.
>>>> + *
>>>> + * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is a further
>>>> + *             bit remapping step that occurs at an even lower level than the
>>>> + *             page kind and block linear swizzles.  This causes the layout of
>>>> + *             surfaces mapped in those SOC's GPUs to be incompatible with the
>>>> + *             equivalent mapping on other GPUs in the same system.
>>>> + *
>>>> + *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
>>>> + *               1 = Desktop GPU and Tegra Xavier+ Layout
>>>> + *
>>>> + * 24:23 c     Lossless Framebuffer Compression type.
>>>> + *
>>>> + *               0 = none
>>>> + *               1 = ROP/3D, actual compression implied by the Page Kind field
>>>> + *               2 = CDE horizontal
>>>> + *               3 = CDE vertical
>>>> + *
>>>> + * 55:25 -     Reserved for future use.  Must be zero.
>>>> + */
>>>> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
>>>> +    fourcc_mod_code(NVIDIA, (0x10 | \
>>>> +                             ((h) & 0xf) | \
>>>> +                             (((k) & 0xff) << 12) | \
>>>> +                             (((g) & 0x3) << 20) | \
>>>> +                             (((s) & 0x1) << 22) | \
>>>> +                             (((c) & 0x3) << 23)))
>>>> +
>>>> +/*
>>>> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
>>>>     *
>>>>     * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked
>>>>     * vertically by a power of 2 (1 to 32 GOBs) to form a block.
>>>> @@ -518,20 +610,20 @@ extern "C" {
>>>>     * in full detail.
>>>>     */
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
>>>> -    fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
>>>> +    DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
>>>>
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x10)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x11)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x12)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x13)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x14)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
>>>>    #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
>>>> -    fourcc_mod_code(NVIDIA, 0x15)
>>>> +    DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
>>>>
>>>>    /*
>>>>     * Some Broadcom modifiers take parameters, for example the number of
>>>> --
>>>> 2.17.1
>>>>
>>>
> 
> 
>