[RFC] drm: Add AMD GFX9+ format modifiers.

Bas Nieuwenhuizen bas at basnieuwenhuizen.nl
Fri Oct 18 15:29:16 UTC 2019


On Thu, Oct 17, 2019 at 9:50 PM Marek Olšák <maraeo at gmail.com> wrote:
>
> On Wed, Oct 16, 2019 at 9:48 AM Bas Nieuwenhuizen <bas at basnieuwenhuizen.nl> wrote:
>>
>> This adds initial format modifiers for AMD GFX9 and newer GPUs.
>>
>> This is particularly useful to determine if we can use DCC, and whether
>> we need an extra display compatible DCC metadata plane.
>>
>> Design decisions:
>>   - Always expose a single plane
>>        This way everything works correctly with images with multiple planes.
>>
>>   - Do not add an extra memory region in DCC for putting a bit on whether
>>     we are in compressed state.
>>        A decompress on import is cheap enough if already decompressed, and
>>        I do think in most cases we can avoid it in advance during modifier
>>        negotiation. The remainder is probably not common enough to worry
>>        about.
>>
>>   - Explicitly define the sizes as part of the modifier description instead
>>     of using whatever the current version of radeonsi does.
>>        This way we can avoid dedicated buffers and we can make sure we keep
>>        compatibility across mesa versions. I'd like to put some tests on
>>        this on ac_surface.c so we can learn early in the process if things
>>        need to be changed. Furthermore, the lack of configurable strides on
>>        GFX10 means things already go wrong if we do not agree, making a
>>        custom stride somewhat less useful.
>
>
> The custom stride will be back for 2D images (not for 3D/Array), so Navi10-14 will be the only hw not supporting the custom stride for 2D. It might not be worth adding the width and height into the modifier just because of Navi10-14, though I don't feel strongly about it.

Right, I'll clarify the text.

I meant standardizing how we get the surface_size/dcc_size/total_size
(+ alignment of DCC metadata if bigger than surface alignment), so we
get to agree about offsets.

I believe we should not put in width/height in the modifier as (1) we
are allowed to assume every party in negotiation puts in the same
width (even though minigbm violates that currently ...) (2) this would
be not workable with most enumeration APIs.

>
> This patch doesn't add the sizes into the description anyway.
>
> The rest looks good.
>
> Marek
>
>>
>>
>>   - No usage of BO metadata at all for modifier usecases.
>>        To avoid the requirement of dedicated dma bufs per image. For
>>        non-modifier based interop we still use the BO metadata, since we
>>        need to keep compatibility with old mesa and this is used for
>>        depth/msaa/3d/CL etc. API interop.
>>
>>   - A single FD for all planes.
>>        Easier in Vulkan / bindless and radeonsi is already transitioning.
>>
>>   - Make a single modifier for DCN1
>>       It defines things uniquely given bpp, which we can assume, so adding
>>       more modifier values do not add clarity.
>>
>>   - Not exposing the 4K and 256B tiling modes.
>>       These are largely only better for something like a cursor or very long
>>       and/or tall images. Are they worth the added complexity to save memory?
>>       For context, at 32bpp, tiles are 128x128 pixels.
>>
>>   - For multiplane images, every plane uses the same tiling.
>>       On GFX9/GFX10 we can, so no need to make it complicated.
>>
>>   - We use family_id + external_rev to distinguish between incompatible GPUs.
>>       PCI ID is not enough, as RAVEN and RAVEN2 have the same PCI device id,
>>       but different tiling. We might be able to find bigger equivalence
>>       groups for _X, but especially for DCC I would be uncomfortable making it
>>       shared between GPUs.
>>
>>   - For DCN1 DCC, radeonsi currently uses another texelbuffer with indices
>>     to reorder. This is not shared.
>>       Specific to current implementation and does not need to be shared. To
>>       pave the way to shader-based solution, lets keep this internal to each
>>       driver. This should reduce the modifier churn if any of the driver
>>       implementations change. (Especially as you'd want to support the old
>>       implementation for a while to stay compatible with old kernels not
>>       supporting a new modifier yet).
>>
>>   - No support for rotated swizzling.
>>       Can be added easily later and nothing in the stack would generate it
>>       currently.
>>
>>   - Add extra enum values in the definitions.
>>       This way we can easily switch on modifier without having to pass around
>>       the current GPU everywhere, assuming the modifier has been validated.
>> ---
>>
>>  Since my previous attempt for modifiers got bogged down on details for
>>  the GFX6-GFX8 modifiers in previous discussions, this only attempts to
>>  define modifiers for GFX9+, which is significantly simpler.
>>
>>  For a final version I'd like to wait until I have written most of the
>>  userspace + kernelspace so we can actually test it. However, I'd
>>  appreciate any early feedback people are willing to give.
>>
>>  Initial Mesa amd/common support + tests are available at
>>  https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/tree/modifiers
>>
>>  I tested the HW to actually behave as described in the descriptions
>>  on Raven and plan to test on a subset of the others.
>>
>>  include/uapi/drm/drm_fourcc.h | 118 ++++++++++++++++++++++++++++++++++
>>  1 file changed, 118 insertions(+)
>>
>> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>> index 3feeaa3f987a..9bd286ab2bee 100644
>> --- a/include/uapi/drm/drm_fourcc.h
>> +++ b/include/uapi/drm/drm_fourcc.h
>> @@ -756,6 +756,124 @@ extern "C" {
>>   */
>>  #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
>>
>> +/*
>> + * AMD GFX9+ format modifiers
>> + */
>> +
>> +/*
>> + * enum-like values for easy switches.
>> + *
>> + * No fixed field-size but implementations are supposed to enforce all-zeros of
>> + * unused bits during validation.
>> + */
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id       0
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id        1
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id     2
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id      3
>> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id      4
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id 5
>> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id  6
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id     7
>> +
>> +/*
>> + * tiling modes that are compatible between all GPUs that support the tiling
>> + * mode.
>> + *
>> + * STANDARD/DISPLAY/ROTATED + bitdepth determine the indexing within a 256 byte
>> + * micro-block.
>> + *
>> + * The macro-block is 64 KiB and the micro-block in macro-block addressing is
>> + * y0-x0-y1-x1-... up till the dimensions of the macro-block.
>> + *
>> + * The image is then a plain row-major image of macro-blocks.
>> + */
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD \
>> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id)
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY  \
>> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id)
>> +
>> +/*
>> + * Same as above, but applies a transformation on the micro-block in macro-block
>> + * indexing that depends on the GPU pipes, shader engines and banks.
>> + *
>> + * RENDER is a new micro-block tiling for GFX10+.
>> + */
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD(family_id, external_rev)  \
>> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id | \
>> +                            ((uint64_t)family_id << 40) |               \
>> +                            ((uint64_t)external_rev << 48))
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY(family_id, external_rev)   \
>> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id |  \
>> +                            ((uint64_t)family_id << 40) |               \
>> +                            ((uint64_t)external_rev << 48))
>> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER(family_id, external_rev)   \
>> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id |  \
>> +                            ((uint64_t)family_id << 40) |               \
>> +                            ((uint64_t)external_rev << 48))
>> +
>> +/*
>> + * Same as above, but with DCC enabled.
>> + *
>> + * We add the PCI ID of the device to make sure the transformation above is
>> + * applied the same way, as well as make sure the implementation of DCC supports
>> + * the same patterns.
>> + *
>> + * The DCC is pipe-aligned (and on GFX9 rb-aligned).
>> + *
>> + * This includes 2 memory regions per plane:
>> + *  - main image
>> + *  - DCC metadata
>> + *
>> + * These are tightly packed according to platform specific DCC alignment
>> + * requirements.
>> + *
>> + * pipe+rb aligned DCC alignment:
>> + * - GFX9: MAX(65536,
>> + *             MIN2(32, pipes * shader_engines) *
>> + *               num_backends * interleave_bytes)
>> + * - GFX10 (without rbplus): MAX2(pipes * interleave_bytes, 4096)
>> + *
>> + * aligned DCC size:
>> + * - GFX9:
>> + *    tiles of MAX2(256 * num_backends KiB, 1 MiB) of pixel data (prefer
>> + *    width if odd log2) at ratio 1/256
>> + * - GFX10 (without rbplus):
>> + *    tiles of 256 * MAX2(pipes * interleave_bytes, 4096) of pixel data
>> + *    (prefer width if odd log2) at ratio 1/256
>> + */
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC(family_id, external_rev)  \
>> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id | \
>> +                            ((uint64_t)family_id << 40) |                   \
>> +                            ((uint64_t)external_rev << 48))
>> +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC(family_id, external_rev)   \
>> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id |  \
>> +                            ((uint64_t)family_id << 40) |                   \
>> +                            ((uint64_t)external_rev << 48))
>> +
>> +/*
>> + * DCC that is displayable with DCN1 hardware.
>> + *
>> + * for bpp <= 32 bits, the micro-tiling is STANDARD and for bpp == 64 bits, the
>> + * micro-tiling is DISPLAY.
>> + *
>> + * This includes 3 memory regions per plane:
>> + *   - main image
>> + *   - DCC (non aligned)
>> + *   - DCC (pipe-aligned & rb-aligned)
>> + *
>> + * non-aligned DCC alignment:
>> + * - GFX9: MAX(65536, interleave_bytes)
>> + * - GFX10 (without rbplus): 4096
>> + *
>> + * non-aligned DCC size:
>> + * - GFX9 & GFX10 (without rbplus):
>> + *    tiles for 1 MiB of pixel data (prefer width if odd log2) at ratio 1/256
>> + */
>> +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC(family_id, external_rev)  \
>> +       fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id | \
>> +                            ((uint64_t)family_id << 40) |               \
>> +                            ((uint64_t)external_rev << 48))
>> +
>>  #if defined(__cplusplus)
>>  }
>>  #endif
>> --
>> 2.23.0
>>


More information about the amd-gfx mailing list