[RFC] drm: Add AMD GFX9+ format modifiers.

Bas Nieuwenhuizen bas at basnieuwenhuizen.nl
Wed Oct 16 13:46:56 UTC 2019


This adds initial format modifiers for AMD GFX9 and newer GPUs.

This is particularly useful to determine if we can use DCC, and whether
we need an extra display compatible DCC metadata plane.

Design decisions:
  - Always expose a single plane
       This way everything works correctly with images with multiple planes.

  - Do not add an extra memory region in DCC for putting a bit on whether
    we are in compressed state.
       A decompress on import is cheap enough if already decompressed, and
       I do think in most cases we can avoid it in advance during modifier
       negotiation. The remainder is probably not common enough to worry
       about.

  - Explicitly define the sizes as part of the modifier description instead
    of using whatever the current version of radeonsi does.
       This way we can avoid dedicated buffers and we can make sure we keep
       compatibility across mesa versions. I'd like to put some tests on
       this on ac_surface.c so we can learn early in the process if things
       need to be changed. Furthermore, the lack of configurable strides on
       GFX10 means things already go wrong if we do not agree, making a
       custom stride somewhat less useful.

  - No usage of BO metadata at all for modifier usecases.
       To avoid the requirement of dedicated dma bufs per image. For
       non-modifier based interop we still use the BO metadata, since we
       need to keep compatibility with old mesa and this is used for
       depth/msaa/3d/CL etc. API interop.

  - A single FD for all planes.
       Easier in Vulkan / bindless and radeonsi is already transitioning.

  - Make a single modifier for DCN1
      It defines things uniquely given bpp, which we can assume, so adding
      more modifier values do not add clarity.

  - Not exposing the 4K and 256B tiling modes.
      These are largely only better for something like a cursor or very long
      and/or tall images. Are they worth the added complexity to save memory?
      For context, at 32bpp, tiles are 128x128 pixels.

  - For multiplane images, every plane uses the same tiling.
      On GFX9/GFX10 we can, so no need to make it complicated.

  - We use family_id + external_rev to distinguish between incompatible GPUs.
      PCI ID is not enough, as RAVEN and RAVEN2 have the same PCI device id,
      but different tiling. We might be able to find bigger equivalence
      groups for _X, but especially for DCC I would be uncomfortable making it
      shared between GPUs.

  - For DCN1 DCC, radeonsi currently uses another texelbuffer with indices
    to reorder. This is not shared.
      Specific to current implementation and does not need to be shared. To
      pave the way to shader-based solution, lets keep this internal to each
      driver. This should reduce the modifier churn if any of the driver
      implementations change. (Especially as you'd want to support the old
      implementation for a while to stay compatible with old kernels not
      supporting a new modifier yet).

  - No support for rotated swizzling.
      Can be added easily later and nothing in the stack would generate it
      currently.

  - Add extra enum values in the definitions.
      This way we can easily switch on modifier without having to pass around
      the current GPU everywhere, assuming the modifier has been validated.
---

 Since my previous attempt for modifiers got bogged down on details for
 the GFX6-GFX8 modifiers in previous discussions, this only attempts to
 define modifiers for GFX9+, which is significantly simpler.

 For a final version I'd like to wait until I have written most of the
 userspace + kernelspace so we can actually test it. However, I'd
 appreciate any early feedback people are willing to give.

 Initial Mesa amd/common support + tests are available at
 https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/tree/modifiers

 I tested the HW to actually behave as described in the descriptions
 on Raven and plan to test on a subset of the others.

 include/uapi/drm/drm_fourcc.h | 118 ++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 3feeaa3f987a..9bd286ab2bee 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -756,6 +756,124 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
 
+/*
+ * AMD GFX9+ format modifiers
+ */
+
+/*
+ * enum-like values for easy switches.
+ *
+ * No fixed field-size but implementations are supposed to enforce all-zeros of
+ * unused bits during validation.
+ */
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id       0
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id        1
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id     2
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id      3
+#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id      4
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id 5
+#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id  6
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id     7
+
+/*
+ * tiling modes that are compatible between all GPUs that support the tiling
+ * mode.
+ *
+ * STANDARD/DISPLAY/ROTATED + bitdepth determine the indexing within a 256 byte
+ * micro-block.
+ *
+ * The macro-block is 64 KiB and the micro-block in macro-block addressing is
+ * y0-x0-y1-x1-... up till the dimensions of the macro-block.
+ *
+ * The image is then a plain row-major image of macro-blocks.
+ */
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD \
+	fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id)
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY  \
+	fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id)
+
+/*
+ * Same as above, but applies a transformation on the micro-block in macro-block
+ * indexing that depends on the GPU pipes, shader engines and banks.
+ *
+ * RENDER is a new micro-block tiling for GFX10+.
+ */
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD(family_id, external_rev)  \
+	fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id | \
+	                     ((uint64_t)family_id << 40) |               \
+	                     ((uint64_t)external_rev << 48))
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY(family_id, external_rev)   \
+	fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id |  \
+	                     ((uint64_t)family_id << 40) |               \
+	                     ((uint64_t)external_rev << 48))
+#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER(family_id, external_rev)   \
+	fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id |  \
+	                     ((uint64_t)family_id << 40) |               \
+	                     ((uint64_t)external_rev << 48))
+
+/*
+ * Same as above, but with DCC enabled.
+ *
+ * We add the PCI ID of the device to make sure the transformation above is
+ * applied the same way, as well as make sure the implementation of DCC supports
+ * the same patterns.
+ *
+ * The DCC is pipe-aligned (and on GFX9 rb-aligned).
+ *
+ * This includes 2 memory regions per plane:
+ *  - main image
+ *  - DCC metadata
+ *
+ * These are tightly packed according to platform specific DCC alignment
+ * requirements.
+ *
+ * pipe+rb aligned DCC alignment:
+ * - GFX9: MAX(65536,
+ *             MIN2(32, pipes * shader_engines) *
+ *               num_backends * interleave_bytes)
+ * - GFX10 (without rbplus): MAX2(pipes * interleave_bytes, 4096)
+ *
+ * aligned DCC size:
+ * - GFX9:
+ *    tiles of MAX2(256 * num_backends KiB, 1 MiB) of pixel data (prefer
+ *    width if odd log2) at ratio 1/256
+ * - GFX10 (without rbplus):
+ *    tiles of 256 * MAX2(pipes * interleave_bytes, 4096) of pixel data
+ *    (prefer width if odd log2) at ratio 1/256
+ */
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC(family_id, external_rev)  \
+	fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id | \
+	                     ((uint64_t)family_id << 40) |                   \
+	                     ((uint64_t)external_rev << 48))
+#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC(family_id, external_rev)   \
+	fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id |  \
+	                     ((uint64_t)family_id << 40) |                   \
+	                     ((uint64_t)external_rev << 48))
+
+/*
+ * DCC that is displayable with DCN1 hardware.
+ *
+ * for bpp <= 32 bits, the micro-tiling is STANDARD and for bpp == 64 bits, the
+ * micro-tiling is DISPLAY.
+ *
+ * This includes 3 memory regions per plane:
+ *   - main image
+ *   - DCC (non aligned)
+ *   - DCC (pipe-aligned & rb-aligned)
+ *
+ * non-aligned DCC alignment:
+ * - GFX9: MAX(65536, interleave_bytes)
+ * - GFX10 (without rbplus): 4096
+ *
+ * non-aligned DCC size:
+ * - GFX9 & GFX10 (without rbplus):
+ *    tiles for 1 MiB of pixel data (prefer width if odd log2) at ratio 1/256
+ */
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC(family_id, external_rev)  \
+	fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id | \
+	                     ((uint64_t)family_id << 40) |               \
+	                     ((uint64_t)external_rev << 48))
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.23.0



More information about the amd-gfx mailing list