[RFC] drm: Add AMD GFX9+ format modifiers.

Marek Olšák maraeo at gmail.com
Thu Oct 17 19:49:40 UTC 2019


On Wed, Oct 16, 2019 at 9:48 AM Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
wrote:

> This adds initial format modifiers for AMD GFX9 and newer GPUs.
>
> This is particularly useful to determine if we can use DCC, and whether
> we need an extra display compatible DCC metadata plane.
>
> Design decisions:
>   - Always expose a single plane
>        This way everything works correctly with images with multiple
> planes.
>
>   - Do not add an extra memory region in DCC for putting a bit on whether
>     we are in compressed state.
>        A decompress on import is cheap enough if already decompressed, and
>        I do think in most cases we can avoid it in advance during modifier
>        negotiation. The remainder is probably not common enough to worry
>        about.
>
>   - Explicitly define the sizes as part of the modifier description instead
>     of using whatever the current version of radeonsi does.
>        This way we can avoid dedicated buffers and we can make sure we keep
>        compatibility across mesa versions. I'd like to put some tests on
>        this on ac_surface.c so we can learn early in the process if things
>        need to be changed. Furthermore, the lack of configurable strides on
>        GFX10 means things already go wrong if we do not agree, making a
>        custom stride somewhat less useful.
>

The custom stride will be back for 2D images (not for 3D/Array), so
Navi10-14 will be the only hw not supporting the custom stride for 2D. It
might not be worth adding the width and height into the modifier just
because of Navi10-14, though I don't feel strongly about it.

This patch doesn't add the sizes into the description anyway.

The rest looks good.

Marek


>
>   - No usage of BO metadata at all for modifier usecases.
>        To avoid the requirement of dedicated dma bufs per image. For
>        non-modifier based interop we still use the BO metadata, since we
>        need to keep compatibility with old mesa and this is used for
>        depth/msaa/3d/CL etc. API interop.
>
>   - A single FD for all planes.
>        Easier in Vulkan / bindless and radeonsi is already transitioning.
>
>   - Make a single modifier for DCN1
>       It defines things uniquely given bpp, which we can assume, so adding
>       more modifier values do not add clarity.
>
>   - Not exposing the 4K and 256B tiling modes.
>       These are largely only better for something like a cursor or very
> long
>       and/or tall images. Are they worth the added complexity to save
> memory?
>       For context, at 32bpp, tiles are 128x128 pixels.
>
>   - For multiplane images, every plane uses the same tiling.
>       On GFX9/GFX10 we can, so no need to make it complicated.
>
>   - We use family_id + external_rev to distinguish between incompatible
> GPUs.
>       PCI ID is not enough, as RAVEN and RAVEN2 have the same PCI device
> id,
>       but different tiling. We might be able to find bigger equivalence
>       groups for _X, but especially for DCC I would be uncomfortable
> making it
>       shared between GPUs.
>
>   - For DCN1 DCC, radeonsi currently uses another texelbuffer with indices
>     to reorder. This is not shared.
>       Specific to current implementation and does not need to be shared. To
>       pave the way to shader-based solution, lets keep this internal to
> each
>       driver. This should reduce the modifier churn if any of the driver
>       implementations change. (Especially as you'd want to support the old
>       implementation for a while to stay compatible with old kernels not
>       supporting a new modifier yet).
>
>   - No support for rotated swizzling.
>       Can be added easily later and nothing in the stack would generate it
>       currently.
>
>   - Add extra enum values in the definitions.
>       This way we can easily switch on modifier without having to pass
> around
>       the current GPU everywhere, assuming the modifier has been validated.
> ---
>
>  Since my previous attempt for modifiers got bogged down on details for
>  the GFX6-GFX8 modifiers in previous discussions, this only attempts to
>  define modifiers for GFX9+, which is significantly simpler.
>
>  For a final version I'd like to wait until I have written most of the
>  userspace + kernelspace so we can actually test it. However, I'd
>  appreciate any early feedback people are willing to give.
>
>  Initial Mesa amd/common support + tests are available at
>  https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/tree/modifiers
>
>  I tested the HW to actually behave as described in the descriptions
>  on Raven and plan to test on a subset of the others.
>
>  include/uapi/drm/drm_fourcc.h | 118 ++++++++++++++++++++++++++++++++++
>  1 file changed, 118 insertions(+)
>
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 3feeaa3f987a..9bd286ab2bee 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -756,6 +756,124 @@ extern "C" {
>   */
>  #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
>
> +/*
> + * AMD GFX9+ format modifiers
> + */
> +
> +/*
> + * enum-like values for easy switches.
> + *
> + * No fixed field-size but implementations are supposed to enforce
> all-zeros of
> + * unused bits during validation.
> + */
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id       0
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id        1
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id     2
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id      3
> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id      4
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id 5
> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id  6
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id     7
> +
> +/*
> + * tiling modes that are compatible between all GPUs that support the
> tiling
> + * mode.
> + *
> + * STANDARD/DISPLAY/ROTATED + bitdepth determine the indexing within a
> 256 byte
> + * micro-block.
> + *
> + * The macro-block is 64 KiB and the micro-block in macro-block
> addressing is
> + * y0-x0-y1-x1-... up till the dimensions of the macro-block.
> + *
> + * The image is then a plain row-major image of macro-blocks.
> + */
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD \
> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id)
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY  \
> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id)
> +
> +/*
> + * Same as above, but applies a transformation on the micro-block in
> macro-block
> + * indexing that depends on the GPU pipes, shader engines and banks.
> + *
> + * RENDER is a new micro-block tiling for GFX10+.
> + */
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD(family_id, external_rev)  \
> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id | \
> +                            ((uint64_t)family_id << 40) |               \
> +                            ((uint64_t)external_rev << 48))
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY(family_id, external_rev)   \
> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id |  \
> +                            ((uint64_t)family_id << 40) |               \
> +                            ((uint64_t)external_rev << 48))
> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER(family_id, external_rev)   \
> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id |  \
> +                            ((uint64_t)family_id << 40) |               \
> +                            ((uint64_t)external_rev << 48))
> +
> +/*
> + * Same as above, but with DCC enabled.
> + *
> + * We add the PCI ID of the device to make sure the transformation above
> is
> + * applied the same way, as well as make sure the implementation of DCC
> supports
> + * the same patterns.
> + *
> + * The DCC is pipe-aligned (and on GFX9 rb-aligned).
> + *
> + * This includes 2 memory regions per plane:
> + *  - main image
> + *  - DCC metadata
> + *
> + * These are tightly packed according to platform specific DCC alignment
> + * requirements.
> + *
> + * pipe+rb aligned DCC alignment:
> + * - GFX9: MAX(65536,
> + *             MIN2(32, pipes * shader_engines) *
> + *               num_backends * interleave_bytes)
> + * - GFX10 (without rbplus): MAX2(pipes * interleave_bytes, 4096)
> + *
> + * aligned DCC size:
> + * - GFX9:
> + *    tiles of MAX2(256 * num_backends KiB, 1 MiB) of pixel data (prefer
> + *    width if odd log2) at ratio 1/256
> + * - GFX10 (without rbplus):
> + *    tiles of 256 * MAX2(pipes * interleave_bytes, 4096) of pixel data
> + *    (prefer width if odd log2) at ratio 1/256
> + */
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC(family_id,
> external_rev)  \
> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id
> | \
> +                            ((uint64_t)family_id << 40) |
>    \
> +                            ((uint64_t)external_rev << 48))
> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC(family_id,
> external_rev)   \
> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id
> |  \
> +                            ((uint64_t)family_id << 40) |
>    \
> +                            ((uint64_t)external_rev << 48))
> +
> +/*
> + * DCC that is displayable with DCN1 hardware.
> + *
> + * for bpp <= 32 bits, the micro-tiling is STANDARD and for bpp == 64
> bits, the
> + * micro-tiling is DISPLAY.
> + *
> + * This includes 3 memory regions per plane:
> + *   - main image
> + *   - DCC (non aligned)
> + *   - DCC (pipe-aligned & rb-aligned)
> + *
> + * non-aligned DCC alignment:
> + * - GFX9: MAX(65536, interleave_bytes)
> + * - GFX10 (without rbplus): 4096
> + *
> + * non-aligned DCC size:
> + * - GFX9 & GFX10 (without rbplus):
> + *    tiles for 1 MiB of pixel data (prefer width if odd log2) at ratio
> 1/256
> + */
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC(family_id, external_rev)  \
> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id | \
> +                            ((uint64_t)family_id << 40) |               \
> +                            ((uint64_t)external_rev << 48))
> +
>  #if defined(__cplusplus)
>  }
>  #endif
> --
> 2.23.0
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191017/321a0c24/attachment.html>


More information about the dri-devel mailing list