[PATCH] drm: Generalized NV Block Linear DRM format mod
Daniel Vetter
daniel at ffwll.ch
Tue Oct 15 14:19:13 UTC 2019
On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:
> Builds upon the existing NVIDIA 16Bx2 block linear
> format modifiers by adding more "fields" to the
> existing parameterized
> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
> macro that allow fully defining a unique-across-
> all-NVIDIA-hardware bit layout using a minimal
> set of fields and values. The new modifier macro
> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
> effectively backwards compatible with the existing
> macro, introducing a superset of the previously
> definable format modifiers.
>
> Backwards compatibility has two quirks. First,
> the zero value for the "kind" field, which is
> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
> macro, must be special cased in drivers and
> assumed to map to the pre-Turing generic kind of
> 0xfe, since a kind of "zero" is reserved for
> linear buffer layouts on all GPUs.
>
> Second, it is assumed backwards compatibility
> is only needed when running on Tegra GPUs, and
> specifically Tegra GPUs prior to Xavier. This
> is based on two assertions:
>
> -Tegra GPUs prior to Xavier used a slightly
> different raw bit layout than desktop GPUs,
> making it impossible to directly share block
> linear buffers between the two.
>
> -Support for the existing block linear modifiers
> was incomplete, making them useful only for
> exporting buffers created by nouveau and
> importing them to Tegra DRM as framebuffers for
> scan out. There was no support for adding
> framebuffers using format modifiers in nouveau,
> nor importing dma-buf/PRIME GEM objects into
> nouveau userspace drivers with modifiers in Mesa.
>
> Hence it is assumed the prior modifiers were not
> intended for use on desktop GPUs, and as a
> corrolary, were not intended to support sharing
> block linear buffers across two different NVIDIA
> GPUs.
>
> Signed-off-by: James Jones <jajones at nvidia.com>
> ---
> include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++---
> 1 file changed, 100 insertions(+), 8 deletions(-)
>
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 3feeaa3f987a..cc9853d42a24 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -497,7 +497,99 @@ extern "C" {
> #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
>
> /*
> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80,
> + * and Tegra GPUs starting with Tegra K1.
> + *
> + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies
> + * based on the architecture generation. GOBs themselves are then arranged in
> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
> + * a block depth or height of "4").
> + *
> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
> + * in full detail.
> + *
> + * Macro
> + * Bits Param Description
> + * ---- ----- -----------------------------------------------------------------
> + *
> + * 3:0 h log2(height) of each block, in GOBs. Placed here for
> + * compatibility with the existing
> + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> + *
> + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for
> + * compatibility with the existing
> + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> + *
> + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block
> + * size). Must be zero.
> + *
> + * Note there is no log2(width) parameter. Some portions of the
> + * hardware support a block width of two gobs, but it is impractical
> + * to use due to lack of support elsewhere, and has no known
> + * benefits.
> + *
> + * 11:9 - Reserved (To support 2D-array textures with variable array stride
> + * in blocks, specified via log2(tile width in blocks)). Must be
> + * zero.
> + *
> + * 19:12 k Page Kind. This value directly maps to a field in the page
> + * tables of all GPUs >= NV50. It affects the exact layout of bits
> + * in memory and can be derived from the tuple
> + *
> + * (format, GPU model, compression type, samples per pixel)
> + *
> + * Where compression type is defined below. If GPU model were
> + * implied by the format modifier, format, or memory buffer, page
> + * kind would not need to be included in the modifier itself, but
> + * since the modifier should define the layout of the associated
> + * memory buffer independent from any device or other context, it
> + * must be included here.
> + *
> + * To grandfather in prior block linear format modifiers to this
> + * layout, the page kind "0", which corresponds to "pitch/linear"
> + * and hence is unusable with block-linear layouts, is remapped
> + * within drivers to the value 0xfe, which corresponds to the
> + * "generic" kind used for simple single-sample color formats on
> + * pre-Turing GPUs.
Hm, maybe a tiny static inline function which canonizalizes modifiers?
Something like
static inline u64
drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
is_pre_turing)
{
}
Would then give you a nice place to stick this backward compat note and
make it really clear what should be done. I think establishing this as a
pattern would also be nice, since I'm sure we'll have a pile more of these
cases where modifiers turn out to assume a few too many things about the
platform they're used on (we have a similar case on the intel side too).
Just a drive-by idea, feel free to ignore.
Cheers, Daniel
> + *
> + * 21:20 g GOB Height and Page Kind Generation. The height of a GOB changed
> + * starting with Fermi GPUs. Additionally, the mapping between page
> + * kind and bit layout has changed at various points.
> + *
> + * 0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping
> + * 1 = Gob Height 4, G80 - GT2XX Page Kind mapping
> + * 2 = Gob Height 8, Turing+ Page Kind mapping
> + * 3 = Reserved for future use.
> + *
> + * 22:22 s Sector layout. On Tegra GPUs prior to Xavier, there is a further
> + * bit remapping step that occurs at an even lower level than the
> + * page kind and block linear swizzles. This causes the layout of
> + * surfaces mapped in those SOC's GPUs to be incompatible with the
> + * equivalent mapping on other GPUs in the same system.
> + *
> + * 0 = Tegra K1 - Tegra Parker/TX2 Layout.
> + * 1 = Desktop GPU and Tegra Xavier+ Layout
> + *
> + * 24:23 c Lossless Framebuffer Compression type.
> + *
> + * 0 = none
> + * 1 = ROP/3D, actual compression implied by the Page Kind field
> + * 2 = CDE horizontal
> + * 3 = CDE vertical
> + *
> + * 55:25 - Reserved for future use. Must be zero.
> + */
> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
> + fourcc_mod_code(NVIDIA, (0x10 | \
> + ((h) & 0xf) | \
> + (((k) & 0xff) << 12) | \
> + (((g) & 0x3) << 20) | \
> + (((s) & 0x1) << 22) | \
> + (((c) & 0x3) << 23)))
> +
> +/*
> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
> *
> * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked
> * vertically by a power of 2 (1 to 32 GOBs) to form a block.
> @@ -518,20 +610,20 @@ extern "C" {
> * in full detail.
> */
> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
> - fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
> + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
>
> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
> - fourcc_mod_code(NVIDIA, 0x10)
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
> - fourcc_mod_code(NVIDIA, 0x11)
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
> - fourcc_mod_code(NVIDIA, 0x12)
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
> - fourcc_mod_code(NVIDIA, 0x13)
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
> - fourcc_mod_code(NVIDIA, 0x14)
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
> - fourcc_mod_code(NVIDIA, 0x15)
> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
>
> /*
> * Some Broadcom modifiers take parameters, for example the number of
> --
> 2.17.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the dri-devel
mailing list