[PATCH] drm: Generalized NV Block Linear DRM format mod
Daniel Vetter
daniel at ffwll.ch
Tue Oct 15 15:42:10 UTC 2019
On Tue, Oct 15, 2019 at 5:14 PM James Jones <jajones at nvidia.com> wrote:
>
> On 10/15/19 7:19 AM, Daniel Vetter wrote:
> > On Mon, Oct 14, 2019 at 03:13:21PM -0700, James Jones wrote:
> >> Builds upon the existing NVIDIA 16Bx2 block linear
> >> format modifiers by adding more "fields" to the
> >> existing parameterized
> >> DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK format modifier
> >> macro that allow fully defining a unique-across-
> >> all-NVIDIA-hardware bit layout using a minimal
> >> set of fields and values. The new modifier macro
> >> DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D is
> >> effectively backwards compatible with the existing
> >> macro, introducing a superset of the previously
> >> definable format modifiers.
> >>
> >> Backwards compatibility has two quirks. First,
> >> the zero value for the "kind" field, which is
> >> implied by the DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK
> >> macro, must be special cased in drivers and
> >> assumed to map to the pre-Turing generic kind of
> >> 0xfe, since a kind of "zero" is reserved for
> >> linear buffer layouts on all GPUs.
> >>
> >> Second, it is assumed backwards compatibility
> >> is only needed when running on Tegra GPUs, and
> >> specifically Tegra GPUs prior to Xavier. This
> >> is based on two assertions:
> >>
> >> -Tegra GPUs prior to Xavier used a slightly
> >> different raw bit layout than desktop GPUs,
> >> making it impossible to directly share block
> >> linear buffers between the two.
> >>
> >> -Support for the existing block linear modifiers
> >> was incomplete, making them useful only for
> >> exporting buffers created by nouveau and
> >> importing them to Tegra DRM as framebuffers for
> >> scan out. There was no support for adding
> >> framebuffers using format modifiers in nouveau,
> >> nor importing dma-buf/PRIME GEM objects into
> >> nouveau userspace drivers with modifiers in Mesa.
> >>
> >> Hence it is assumed the prior modifiers were not
> >> intended for use on desktop GPUs, and as a
> >> corrolary, were not intended to support sharing
> >> block linear buffers across two different NVIDIA
> >> GPUs.
> >>
> >> Signed-off-by: James Jones <jajones at nvidia.com>
> >> ---
> >> include/uapi/drm/drm_fourcc.h | 108 +++++++++++++++++++++++++++++++---
> >> 1 file changed, 100 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> >> index 3feeaa3f987a..cc9853d42a24 100644
> >> --- a/include/uapi/drm/drm_fourcc.h
> >> +++ b/include/uapi/drm/drm_fourcc.h
> >> @@ -497,7 +497,99 @@ extern "C" {
> >> #define DRM_FORMAT_MOD_NVIDIA_TEGRA_TILED fourcc_mod_code(NVIDIA, 1)
> >>
> >> /*
> >> - * 16Bx2 Block Linear layout, used by desktop GPUs, and Tegra K1 and later
> >> + * Generalized Block Linear layout, used by desktop GPUs starting with NV50/G80,
> >> + * and Tegra GPUs starting with Tegra K1.
> >> + *
> >> + * Pixels are arranged in Groups of Bytes (GOBs). GOB size and layout varies
> >> + * based on the architecture generation. GOBs themselves are then arranged in
> >> + * 3D blocks, with the block dimensions (in terms of GOBs) always being a power
> >> + * of two, and hence expressible as their log2 equivalent (E.g., "2" represents
> >> + * a block depth or height of "4").
> >> + *
> >> + * Chapter 20 "Pixel Memory Formats" of the Tegra X1 TRM describes this format
> >> + * in full detail.
> >> + *
> >> + * Macro
> >> + * Bits Param Description
> >> + * ---- ----- -----------------------------------------------------------------
> >> + *
> >> + * 3:0 h log2(height) of each block, in GOBs. Placed here for
> >> + * compatibility with the existing
> >> + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> >> + *
> >> + * 4:4 - Must be 1, to indicate block-linear layout. Necessary for
> >> + * compatibility with the existing
> >> + * DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK()-based modifiers.
> >> + *
> >> + * 8:5 - Reserved (To support 3D-surfaces with variable log2(depth) block
> >> + * size). Must be zero.
> >> + *
> >> + * Note there is no log2(width) parameter. Some portions of the
> >> + * hardware support a block width of two gobs, but it is impractical
> >> + * to use due to lack of support elsewhere, and has no known
> >> + * benefits.
> >> + *
> >> + * 11:9 - Reserved (To support 2D-array textures with variable array stride
> >> + * in blocks, specified via log2(tile width in blocks)). Must be
> >> + * zero.
> >> + *
> >> + * 19:12 k Page Kind. This value directly maps to a field in the page
> >> + * tables of all GPUs >= NV50. It affects the exact layout of bits
> >> + * in memory and can be derived from the tuple
> >> + *
> >> + * (format, GPU model, compression type, samples per pixel)
> >> + *
> >> + * Where compression type is defined below. If GPU model were
> >> + * implied by the format modifier, format, or memory buffer, page
> >> + * kind would not need to be included in the modifier itself, but
> >> + * since the modifier should define the layout of the associated
> >> + * memory buffer independent from any device or other context, it
> >> + * must be included here.
> >> + *
> >> + * To grandfather in prior block linear format modifiers to this
> >> + * layout, the page kind "0", which corresponds to "pitch/linear"
> >> + * and hence is unusable with block-linear layouts, is remapped
> >> + * within drivers to the value 0xfe, which corresponds to the
> >> + * "generic" kind used for simple single-sample color formats on
> >> + * pre-Turing GPUs.
> >
> > Hm, maybe a tiny static inline function which canonizalizes modifiers?
> > Something like
> >
> > static inline u64
> > drm_fourcc_canonicalize_nvidia_block_linear_2d(u64 modifer, bool
> > is_pre_turing)
> > {
> > }
> >
> > Would then give you a nice place to stick this backward compat note and
> > make it really clear what should be done. I think establishing this as a
> > pattern would also be nice, since I'm sure we'll have a pile more of these
> > cases where modifiers turn out to assume a few too many things about the
> > platform they're used on (we have a similar case on the intel side too).
>
> To make sure I'm clear, it would behave like this?
>
> fixed_mod = canonicalize(old_style_valid_mod, true);
> assert(fixed_mod == old_style_valid_mod | (0xfe << 12));
> fixed_mod = canonicalize(new_style_valid_mod, [false,true]);
> assert(fixed_mod == new_style_valid_mod);
>
> I'm unclear what it should do in this case though:
>
> fixed_mod = canonicalize(old_style_valid_mod, false);
>
> Since there's no code out there using the old style modifiers with
> Turing+ yet, and I don't want to try to support such usage. Maybe just
> drop the "is_pre_turing" parameter and always canonicalize by mapping 0
> -> 0xfe as the comment above states, and not touching other values? Or
> should any invalid modifier, including this case, return
> DRM_FORMAT_MOD_INVALID? This latter idea seems risky because it would
> cause software compiled against old drm_fourcc.h to potentially reject
> format modifiers from newer kernels or libraries with an expanded
> representation, but maybe that's what we want if only driver components
> are supposed to call this function.
That's where I'm showing that I have no clue about nvidia buffer
formats I guess :-)
If 0 never makes sense, then yeah I guess you could just
unconditionally canonicalize. Maybe you want to reject the old/legacy
style on turing+ plus, but that's a tradeoff up to you guys - as long
as it's consistent across all involved drivers. Not rejecting it would
essentially make 0 an alias for 0xfe everywhere, post and pre-turing.
More reasons for having a shared canonicalize function which enforces
the same behaviour (whichever you pick) everywhere I think.
And yeah the canonicalize function would only be called by drivers,
not by clients.
-Daniel
> Thanks,
> -James
>
> > Just a drive-by idea, feel free to ignore.
> >
> > Cheers, Daniel
> >
> >> + *
> >> + * 21:20 g GOB Height and Page Kind Generation. The height of a GOB changed
> >> + * starting with Fermi GPUs. Additionally, the mapping between page
> >> + * kind and bit layout has changed at various points.
> >> + *
> >> + * 0 = Gob Height 8, Fermi - Volta, Tegra K1+ Page Kind mapping
> >> + * 1 = Gob Height 4, G80 - GT2XX Page Kind mapping
> >> + * 2 = Gob Height 8, Turing+ Page Kind mapping
> >> + * 3 = Reserved for future use.
> >> + *
> >> + * 22:22 s Sector layout. On Tegra GPUs prior to Xavier, there is a further
> >> + * bit remapping step that occurs at an even lower level than the
> >> + * page kind and block linear swizzles. This causes the layout of
> >> + * surfaces mapped in those SOC's GPUs to be incompatible with the
> >> + * equivalent mapping on other GPUs in the same system.
> >> + *
> >> + * 0 = Tegra K1 - Tegra Parker/TX2 Layout.
> >> + * 1 = Desktop GPU and Tegra Xavier+ Layout
> >> + *
> >> + * 24:23 c Lossless Framebuffer Compression type.
> >> + *
> >> + * 0 = none
> >> + * 1 = ROP/3D, actual compression implied by the Page Kind field
> >> + * 2 = CDE horizontal
> >> + * 3 = CDE vertical
> >> + *
> >> + * 55:25 - Reserved for future use. Must be zero.
> >> + */
> >> +#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \
> >> + fourcc_mod_code(NVIDIA, (0x10 | \
> >> + ((h) & 0xf) | \
> >> + (((k) & 0xff) << 12) | \
> >> + (((g) & 0x3) << 20) | \
> >> + (((s) & 0x1) << 22) | \
> >> + (((c) & 0x3) << 23)))
> >> +
> >> +/*
> >> + * 16Bx2 Block Linear layout, used by Tegra K1 and later
> >> *
> >> * Pixels are arranged in 64x8 Groups Of Bytes (GOBs). GOBs are then stacked
> >> * vertically by a power of 2 (1 to 32 GOBs) to form a block.
> >> @@ -518,20 +610,20 @@ extern "C" {
> >> * in full detail.
> >> */
> >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(v) \
> >> - fourcc_mod_code(NVIDIA, 0x10 | ((v) & 0xf))
> >> + DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(0, 0, 0, 0, (v))
> >>
> >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_ONE_GOB \
> >> - fourcc_mod_code(NVIDIA, 0x10)
> >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(0)
> >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_TWO_GOB \
> >> - fourcc_mod_code(NVIDIA, 0x11)
> >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(1)
> >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_FOUR_GOB \
> >> - fourcc_mod_code(NVIDIA, 0x12)
> >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(2)
> >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_EIGHT_GOB \
> >> - fourcc_mod_code(NVIDIA, 0x13)
> >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(3)
> >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_SIXTEEN_GOB \
> >> - fourcc_mod_code(NVIDIA, 0x14)
> >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(4)
> >> #define DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK_THIRTYTWO_GOB \
> >> - fourcc_mod_code(NVIDIA, 0x15)
> >> + DRM_FORMAT_MOD_NVIDIA_16BX2_BLOCK(5)
> >>
> >> /*
> >> * Some Broadcom modifiers take parameters, for example the number of
> >> --
> >> 2.17.1
> >>
> >
--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
More information about the dri-devel
mailing list