[PATCH] drm: Generalized NV Block Linear DRM format mod

Tue Oct 15 14:19:13 UTC 2019

On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:
> Builds upon the existing NVIDIA 16Bx2 block linear
> format modifiers by adding more "fields" to the
> existing parameterized
> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
> macro that allow fully defining a unique-across-
> all-NVIDIA-hardware bit layout using a minimal
> set of fields and values.  The new modifier macro
> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
> effectively backwards compatible with the existing
> macro, introducing a superset of the previously
> definable format modifiers.
> 
> Backwards compatibility has two quirks.  First,
> the zero value for the "kind" field, which is
> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
> macro, must be special cased in drivers and
> assumed to map to the pre-Turing generic kind of
> 0xfe, since a kind of "zero" is reserved for
> linear buffer layouts on all GPUs.
> 
> Second, it is assumed backwards compatibility
> is only needed when running on Tegra GPUs, and
> specifically Tegra GPUs prior to Xavier.  This
> is based on two assertions:
> 
> -Tegra GPUs prior to Xavier used a slightly
>  different raw bit layout than desktop GPUs,
>  making it impossible to directly share block
>  linear buffers between the two.
> 
> -Support for the existing block linear modifiers
>  was incomplete, making them useful only for
>  exporting buffers created by nouveau and
>  importing them to Tegra DRM as framebuffers for
>  scan out.  There was no support for adding
>  framebuffers using format modifiers in nouveau,
>  nor importing dma-buf/PRIME GEM objects into
>  nouveau userspace drivers with modifiers in Mesa.
> 
> Hence it is assumed the prior modifiers were not
> intended for use on desktop GPUs, and as a
> corrolary, were not intended to support sharing
> block linear buffers across two different NVIDIA
> GPUs.
> 
> Signed-off-by: James Jones <jajones at nvidia.com>
> ---
>  include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++---
>  1 file changed, 100 insertions(+), 8 deletions(-)
> 
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 3feeaa3f987a..cc9853d42a24 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -497,7 +497,99 @@ extern "C" {
>  #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
>  
>  /*
> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80,
> + * and Tegra GPUs starting with Tegra K1.
> + *
> + * Pixels are arranged in Groups of Bytes (GOBs).  GOB size and layout varies
> + * based on the architecture generation.  GOBs themselves are then arranged in
> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
> + * a block depth or height of "4").
> + *
> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
> + * in full detail.
> + *
> + *       Macro
> + * Bits  Param Description
> + * ----  ----- -----------------------------------------------------------------
> + *
> + *  3:0  h     log2(height) of each block, in GOBs.  Placed here for
> + *             compatibility with the existing
> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> + *
> + *  4:4  -     Must be 1, to indicate block-linear layout.  Necessary for
> + *             compatibility with the existing
> + *             DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> + *
> + *  8:5  -     Reserved (To support 3D-surfaces with variable log2(depth) block
> + *             size).  Must be zero.
> + *
> + *             Note there is no log2(width) parameter.  Some portions of the
> + *             hardware support a block width of two gobs, but it is impractical
> + *             to use due to lack of support elsewhere, and has no known
> + *             benefits.
> + *
> + * 11:9  -     Reserved (To support 2D-array textures with variable array stride
> + *             in blocks, specified via log2(tile width in blocks)).  Must be
> + *             zero.
> + *
> + * 19:12 k     Page Kind.  This value directly maps to a field in the page
> + *             tables of all GPUs >= NV50.  It affects the exact layout of bits
> + *             in memory and can be derived from the tuple
> + *
> + *               (format, GPU model, compression type, samples per pixel)
> + *
> + *             Where compression type is defined below.  If GPU model were
> + *             implied by the format modifier, format, or memory buffer, page
> + *             kind would not need to be included in the modifier itself, but
> + *             since the modifier should define the layout of the associated
> + *             memory buffer independent from any device or other context, it
> + *             must be included here.
> + *
> + *             To grandfather in prior block linear format modifiers to this
> + *             layout, the page kind "0", which corresponds to "pitch/linear"
> + *             and hence is unusable with block-linear layouts, is remapped
> + *             within drivers to the value 0xfe, which corresponds to the
> + *             "generic" kind used for simple single-sample color formats on
> + *             pre-Turing GPUs.

Hm, maybe a tiny static inline function which canonizalizes modifiers?
Something like

static inline u64
drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
is_pre_turing)
{
}

Would then give you a nice place to stick this backward compat note and
make it really clear what should be done. I think establishing this as a
pattern would also be nice, since I'm sure we'll have a pile more of these
cases where modifiers turn out to assume a few too many things about the
platform they're used on (we have a similar case on the intel side too).

Just a drive-by idea, feel free to ignore.

Cheers, Daniel

> + *
> + * 21:20 g     GOB Height and Page Kind Generation.  The height of a GOB changed
> + *             starting with Fermi GPUs.  Additionally, the mapping between page
> + *             kind and bit layout has changed at various points.
> + *
> + *               0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping
> + *               1 = Gob Height 4, G80 - GT2XX Page Kind mapping
> + *               2 = Gob Height 8, Turing+ Page Kind mapping
> + *               3 = Reserved for future use.
> + *
> + * 22:22 s     Sector layout.  On Tegra GPUs prior to Xavier, there is a further
> + *             bit remapping step that occurs at an even lower level than the
> + *             page kind and block linear swizzles.  This causes the layout of
> + *             surfaces mapped in those SOC's GPUs to be incompatible with the
> + *             equivalent mapping on other GPUs in the same system.
> + *
> + *               0 = Tegra K1 - Tegra Parker/TX2 Layout.
> + *               1 = Desktop GPU and Tegra Xavier+ Layout
> + *
> + * 24:23 c     Lossless Framebuffer Compression type.
> + *
> + *               0 = none
> + *               1 = ROP/3D, actual compression implied by the Page Kind field
> + *               2 = CDE horizontal
> + *               3 = CDE vertical
> + *
> + * 55:25 -     Reserved for future use.  Must be zero.
> + */
> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
> +	fourcc_mod_code(NVIDIA, (0x10 | \
> +				 ((h) & 0xf) | \
> +				 (((k) & 0xff) << 12) | \
> +				 (((g) & 0x3) << 20) | \
> +				 (((s) & 0x1) << 22) | \
> +				 (((c) & 0x3) << 23)))
> +
> +/*
> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
>   *
>   * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked
>   * vertically by a power of 2 (1 to 32 GOBs) to form a block.
> @@ -518,20 +610,20 @@ extern "C" {
>   * in full detail.
>   */
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
> -	fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
> +	DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
>  
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
> -	fourcc_mod_code(NVIDIA, 0x10)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
> -	fourcc_mod_code(NVIDIA, 0x11)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
> -	fourcc_mod_code(NVIDIA, 0x12)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
> -	fourcc_mod_code(NVIDIA, 0x13)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
> -	fourcc_mod_code(NVIDIA, 0x14)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
>  #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
> -	fourcc_mod_code(NVIDIA, 0x15)
> +	DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
>  
>  /*
>   * Some Broadcom modifiers take parameters, for example the number of
> -- 
> 2.17.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch