[RFC] drm: Add AMD GFX9+ format modifiers.
Marek Olšák
maraeo at gmail.com
Thu Oct 17 19:49:40 UTC 2019
On Wed, Oct 16, 2019 at 9:48 AM Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl>
wrote:
> This adds initial format modifiers for AMD GFX9 and newer GPUs.
>
> This is particularly useful to determine if we can use DCC, and whether
> we need an extra display compatible DCC metadata plane.
>
> Design decisions:
> - Always expose a single plane
> This way everything works correctly with images with multiple
> planes.
>
> - Do not add an extra memory region in DCC for putting a bit on whether
> we are in compressed state.
> A decompress on import is cheap enough if already decompressed, and
> I do think in most cases we can avoid it in advance during modifier
> negotiation. The remainder is probably not common enough to worry
> about.
>
> - Explicitly define the sizes as part of the modifier description instead
> of using whatever the current version of radeonsi does.
> This way we can avoid dedicated buffers and we can make sure we keep
> compatibility across mesa versions. I'd like to put some tests on
> this on ac_surface.c so we can learn early in the process if things
> need to be changed. Furthermore, the lack of configurable strides on
> GFX10 means things already go wrong if we do not agree, making a
> custom stride somewhat less useful.
>
The custom stride will be back for 2D images (not for 3D/Array), so
Navi10-14 will be the only hw not supporting the custom stride for 2D. It
might not be worth adding the width and height into the modifier just
because of Navi10-14, though I don't feel strongly about it.
This patch doesn't add the sizes into the description anyway.
The rest looks good.
Marek
>
> - No usage of BO metadata at all for modifier usecases.
> To avoid the requirement of dedicated dma bufs per image. For
> non-modifier based interop we still use the BO metadata, since we
> need to keep compatibility with old mesa and this is used for
> depth/msaa/3d/CL etc. API interop.
>
> - A single FD for all planes.
> Easier in Vulkan / bindless and radeonsi is already transitioning.
>
> - Make a single modifier for DCN1
> It defines things uniquely given bpp, which we can assume, so adding
> more modifier values do not add clarity.
>
> - Not exposing the 4K and 256B tiling modes.
> These are largely only better for something like a cursor or very
> long
> and/or tall images. Are they worth the added complexity to save
> memory?
> For context, at 32bpp, tiles are 128x128 pixels.
>
> - For multiplane images, every plane uses the same tiling.
> On GFX9/GFX10 we can, so no need to make it complicated.
>
> - We use family_id + external_rev to distinguish between incompatible
> GPUs.
> PCI ID is not enough, as RAVEN and RAVEN2 have the same PCI device
> id,
> but different tiling. We might be able to find bigger equivalence
> groups for _X, but especially for DCC I would be uncomfortable
> making it
> shared between GPUs.
>
> - For DCN1 DCC, radeonsi currently uses another texelbuffer with indices
> to reorder. This is not shared.
> Specific to current implementation and does not need to be shared. To
> pave the way to shader-based solution, lets keep this internal to
> each
> driver. This should reduce the modifier churn if any of the driver
> implementations change. (Especially as you'd want to support the old
> implementation for a while to stay compatible with old kernels not
> supporting a new modifier yet).
>
> - No support for rotated swizzling.
> Can be added easily later and nothing in the stack would generate it
> currently.
>
> - Add extra enum values in the definitions.
> This way we can easily switch on modifier without having to pass
> around
> the current GPU everywhere, assuming the modifier has been validated.
> ---
>
> Since my previous attempt for modifiers got bogged down on details for
> the GFX6-GFX8 modifiers in previous discussions, this only attempts to
> define modifiers for GFX9+, which is significantly simpler.
>
> For a final version I'd like to wait until I have written most of the
> userspace + kernelspace so we can actually test it. However, I'd
> appreciate any early feedback people are willing to give.
>
> Initial Mesa amd/common support + tests are available at
> https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/tree/modifiers
>
> I tested the HW to actually behave as described in the descriptions
> on Raven and plan to test on a subset of the others.
>
> include/uapi/drm/drm_fourcc.h | 118 ++++++++++++++++++++++++++++++++++
> 1 file changed, 118 insertions(+)
>
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 3feeaa3f987a..9bd286ab2bee 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -756,6 +756,124 @@ extern "C" {
> */
> #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
>
> +/*
> + * AMD GFX9+ format modifiers
> + */
> +
> +/*
> + * enum-like values for easy switches.
> + *
> + * No fixed field-size but implementations are supposed to enforce
> all-zeros of
> + * unused bits during validation.
> + */
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id 0
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id 1
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id 2
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id 3
> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id 4
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id 5
> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id 6
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id 7
> +
> +/*
> + * tiling modes that are compatible between all GPUs that support the
> tiling
> + * mode.
> + *
> + * STANDARD/DISPLAY/ROTATED + bitdepth determine the indexing within a
> 256 byte
> + * micro-block.
> + *
> + * The macro-block is 64 KiB and the micro-block in macro-block
> addressing is
> + * y0-x0-y1-x1-... up till the dimensions of the macro-block.
> + *
> + * The image is then a plain row-major image of macro-blocks.
> + */
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD \
> + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id)
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY \
> + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id)
> +
> +/*
> + * Same as above, but applies a transformation on the micro-block in
> macro-block
> + * indexing that depends on the GPU pipes, shader engines and banks.
> + *
> + * RENDER is a new micro-block tiling for GFX10+.
> + */
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD(family_id, external_rev) \
> + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id | \
> + ((uint64_t)family_id << 40) | \
> + ((uint64_t)external_rev << 48))
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY(family_id, external_rev) \
> + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id | \
> + ((uint64_t)family_id << 40) | \
> + ((uint64_t)external_rev << 48))
> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER(family_id, external_rev) \
> + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id | \
> + ((uint64_t)family_id << 40) | \
> + ((uint64_t)external_rev << 48))
> +
> +/*
> + * Same as above, but with DCC enabled.
> + *
> + * We add the PCI ID of the device to make sure the transformation above
> is
> + * applied the same way, as well as make sure the implementation of DCC
> supports
> + * the same patterns.
> + *
> + * The DCC is pipe-aligned (and on GFX9 rb-aligned).
> + *
> + * This includes 2 memory regions per plane:
> + * - main image
> + * - DCC metadata
> + *
> + * These are tightly packed according to platform specific DCC alignment
> + * requirements.
> + *
> + * pipe+rb aligned DCC alignment:
> + * - GFX9: MAX(65536,
> + * MIN2(32, pipes * shader_engines) *
> + * num_backends * interleave_bytes)
> + * - GFX10 (without rbplus): MAX2(pipes * interleave_bytes, 4096)
> + *
> + * aligned DCC size:
> + * - GFX9:
> + * tiles of MAX2(256 * num_backends KiB, 1 MiB) of pixel data (prefer
> + * width if odd log2) at ratio 1/256
> + * - GFX10 (without rbplus):
> + * tiles of 256 * MAX2(pipes * interleave_bytes, 4096) of pixel data
> + * (prefer width if odd log2) at ratio 1/256
> + */
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC(family_id,
> external_rev) \
> + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id
> | \
> + ((uint64_t)family_id << 40) |
> \
> + ((uint64_t)external_rev << 48))
> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC(family_id,
> external_rev) \
> + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id
> | \
> + ((uint64_t)family_id << 40) |
> \
> + ((uint64_t)external_rev << 48))
> +
> +/*
> + * DCC that is displayable with DCN1 hardware.
> + *
> + * for bpp <= 32 bits, the micro-tiling is STANDARD and for bpp == 64
> bits, the
> + * micro-tiling is DISPLAY.
> + *
> + * This includes 3 memory regions per plane:
> + * - main image
> + * - DCC (non aligned)
> + * - DCC (pipe-aligned & rb-aligned)
> + *
> + * non-aligned DCC alignment:
> + * - GFX9: MAX(65536, interleave_bytes)
> + * - GFX10 (without rbplus): 4096
> + *
> + * non-aligned DCC size:
> + * - GFX9 & GFX10 (without rbplus):
> + * tiles for 1 MiB of pixel data (prefer width if odd log2) at ratio
> 1/256
> + */
> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC(family_id, external_rev) \
> + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id | \
> + ((uint64_t)family_id << 40) | \
> + ((uint64_t)external_rev << 48))
> +
> #if defined(__cplusplus)
> }
> #endif
> --
> 2.23.0
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20191017/321a0c24/attachment-0001.html>
More information about the amd-gfx
mailing list