<div dir="ltr"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 16, 2019 at 9:48 AM Bas Nieuwenhuizen <<a href="mailto:bas@basnieuwenhuizen.nl">bas@basnieuwenhuizen.nl</a>> wrote: </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">This adds initial format modifiers for AMD GFX9 and newer GPUs. This is particularly useful to determine if we can use DCC, and whether we need an extra display compatible DCC metadata plane. Design decisions: - Always expose a single plane This way everything works correctly with images with multiple planes. - Do not add an extra memory region in DCC for putting a bit on whether we are in compressed state. A decompress on import is cheap enough if already decompressed, and I do think in most cases we can avoid it in advance during modifier negotiation. The remainder is probably not common enough to worry about. - Explicitly define the sizes as part of the modifier description instead of using whatever the current version of radeonsi does. This way we can avoid dedicated buffers and we can make sure we keep compatibility across mesa versions. I'd like to put some tests on this on ac_surface.c so we can learn early in the process if things need to be changed. Furthermore, the lack of configurable strides on GFX10 means things already go wrong if we do not agree, making a custom stride somewhat less useful. </blockquote><div> </div><div>The custom stride will be back for 2D images (not for 3D/Array), so Navi10-14 will be the only hw not supporting the custom stride for 2D. It might not be worth adding the width and height into the modifier just because of Navi10-14, though I don't feel strongly about it.</div><div> </div><div>This patch doesn't add the sizes into the description anyway.</div><div> </div><div>The rest looks good.</div><div> </div><div>Marek </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> - No usage of BO metadata at all for modifier usecases. To avoid the requirement of dedicated dma bufs per image. For non-modifier based interop we still use the BO metadata, since we need to keep compatibility with old mesa and this is used for depth/msaa/3d/CL etc. API interop. - A single FD for all planes. Easier in Vulkan / bindless and radeonsi is already transitioning. - Make a single modifier for DCN1 It defines things uniquely given bpp, which we can assume, so adding more modifier values do not add clarity. - Not exposing the 4K and 256B tiling modes. These are largely only better for something like a cursor or very long and/or tall images. Are they worth the added complexity to save memory? For context, at 32bpp, tiles are 128x128 pixels. - For multiplane images, every plane uses the same tiling. On GFX9/GFX10 we can, so no need to make it complicated. - We use family_id + external_rev to distinguish between incompatible GPUs. PCI ID is not enough, as RAVEN and RAVEN2 have the same PCI device id, but different tiling. We might be able to find bigger equivalence groups for _X, but especially for DCC I would be uncomfortable making it shared between GPUs. - For DCN1 DCC, radeonsi currently uses another texelbuffer with indices to reorder. This is not shared. Specific to current implementation and does not need to be shared. To pave the way to shader-based solution, lets keep this internal to each driver. This should reduce the modifier churn if any of the driver implementations change. (Especially as you'd want to support the old implementation for a while to stay compatible with old kernels not supporting a new modifier yet). - No support for rotated swizzling. Can be added easily later and nothing in the stack would generate it currently. - Add extra enum values in the definitions. This way we can easily switch on modifier without having to pass around the current GPU everywhere, assuming the modifier has been validated. --- Since my previous attempt for modifiers got bogged down on details for the GFX6-GFX8 modifiers in previous discussions, this only attempts to define modifiers for GFX9+, which is significantly simpler. For a final version I'd like to wait until I have written most of the userspace + kernelspace so we can actually test it. However, I'd appreciate any early feedback people are willing to give. Initial Mesa amd/common support + tests are available at <a href="https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/tree/modifiers" rel="noreferrer" target="_blank">https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/tree/modifiers</a> I tested the HW to actually behave as described in the descriptions on Raven and plan to test on a subset of the others. include/uapi/drm/drm_fourcc.h | 118 ++++++++++++++++++++++++++++++++++ 1 file changed, 118 insertions(+) diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 3feeaa3f987a..9bd286ab2bee 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -756,6 +756,124 @@ extern "C" { */ #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1) +/* + * AMD GFX9+ format modifiers + */ + +/* + * enum-like values for easy switches. + * + * No fixed field-size but implementations are supposed to enforce all-zeros of + * unused bits during validation. + */ +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id 0 +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id 1 +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id 2 +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id 3 +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id 4 +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id 5 +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id 6 +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id 7 + +/* + * tiling modes that are compatible between all GPUs that support the tiling + * mode. + * + * STANDARD/DISPLAY/ROTATED + bitdepth determine the indexing within a 256 byte + * micro-block. + * + * The macro-block is 64 KiB and the micro-block in macro-block addressing is + * y0-x0-y1-x1-... up till the dimensions of the macro-block. + * + * The image is then a plain row-major image of macro-blocks. + */ +#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id) +#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id) + +/* + * Same as above, but applies a transformation on the micro-block in macro-block + * indexing that depends on the GPU pipes, shader engines and banks. + * + * RENDER is a new micro-block tiling for GFX10+. + */ +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) + +/* + * Same as above, but with DCC enabled. + * + * We add the PCI ID of the device to make sure the transformation above is + * applied the same way, as well as make sure the implementation of DCC supports + * the same patterns. + * + * The DCC is pipe-aligned (and on GFX9 rb-aligned). + * + * This includes 2 memory regions per plane: + * - main image + * - DCC metadata + * + * These are tightly packed according to platform specific DCC alignment + * requirements. + * + * pipe+rb aligned DCC alignment: + * - GFX9: MAX(65536, + * MIN2(32, pipes * shader_engines) * + * num_backends * interleave_bytes) + * - GFX10 (without rbplus): MAX2(pipes * interleave_bytes, 4096) + * + * aligned DCC size: + * - GFX9: + * tiles of MAX2(256 * num_backends KiB, 1 MiB) of pixel data (prefer + * width if odd log2) at ratio 1/256 + * - GFX10 (without rbplus): + * tiles of 256 * MAX2(pipes * interleave_bytes, 4096) of pixel data + * (prefer width if odd log2) at ratio 1/256 + */ +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) +#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) + +/* + * DCC that is displayable with DCN1 hardware. + * + * for bpp <= 32 bits, the micro-tiling is STANDARD and for bpp == 64 bits, the + * micro-tiling is DISPLAY. + * + * This includes 3 memory regions per plane: + * - main image + * - DCC (non aligned) + * - DCC (pipe-aligned & rb-aligned) + * + * non-aligned DCC alignment: + * - GFX9: MAX(65536, interleave_bytes) + * - GFX10 (without rbplus): 4096 + * + * non-aligned DCC size: + * - GFX9 & GFX10 (without rbplus): + * tiles for 1 MiB of pixel data (prefer width if odd log2) at ratio 1/256 + */ +#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC(family_id, external_rev) \ + fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id | \ + ((uint64_t)family_id << 40) | \ + ((uint64_t)external_rev << 48)) + #if defined(__cplusplus) } #endif -- 2.23.0 </blockquote></div></div>