<div dir="ltr"><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Oct 16, 2019 at 9:48 AM Bas Nieuwenhuizen <<a href="mailto:bas@basnieuwenhuizen.nl">bas@basnieuwenhuizen.nl</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">This adds initial format modifiers for AMD GFX9 and newer GPUs.<br>
<br>
This is particularly useful to determine if we can use DCC, and whether<br>
we need an extra display compatible DCC metadata plane.<br>
<br>
Design decisions:<br>
- Always expose a single plane<br>
This way everything works correctly with images with multiple planes.<br>
<br>
- Do not add an extra memory region in DCC for putting a bit on whether<br>
we are in compressed state.<br>
A decompress on import is cheap enough if already decompressed, and<br>
I do think in most cases we can avoid it in advance during modifier<br>
negotiation. The remainder is probably not common enough to worry<br>
about.<br>
<br>
- Explicitly define the sizes as part of the modifier description instead<br>
of using whatever the current version of radeonsi does.<br>
This way we can avoid dedicated buffers and we can make sure we keep<br>
compatibility across mesa versions. I'd like to put some tests on<br>
this on ac_surface.c so we can learn early in the process if things<br>
need to be changed. Furthermore, the lack of configurable strides on<br>
GFX10 means things already go wrong if we do not agree, making a<br>
custom stride somewhat less useful.<br></blockquote><div><br></div><div>The custom stride will be back for 2D images (not for 3D/Array), so Navi10-14 will be the only hw not supporting the custom stride for 2D. It might not be worth adding the width and height into the modifier just because of Navi10-14, though I don't feel strongly about it.</div><div><br></div><div>This patch doesn't add the sizes into the description anyway.</div><div><br></div><div>The rest looks good.</div><div><br></div><div>Marek<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
- No usage of BO metadata at all for modifier usecases.<br>
To avoid the requirement of dedicated dma bufs per image. For<br>
non-modifier based interop we still use the BO metadata, since we<br>
need to keep compatibility with old mesa and this is used for<br>
depth/msaa/3d/CL etc. API interop.<br>
<br>
- A single FD for all planes.<br>
Easier in Vulkan / bindless and radeonsi is already transitioning.<br>
<br>
- Make a single modifier for DCN1<br>
It defines things uniquely given bpp, which we can assume, so adding<br>
more modifier values do not add clarity.<br>
<br>
- Not exposing the 4K and 256B tiling modes.<br>
These are largely only better for something like a cursor or very long<br>
and/or tall images. Are they worth the added complexity to save memory?<br>
For context, at 32bpp, tiles are 128x128 pixels.<br>
<br>
- For multiplane images, every plane uses the same tiling.<br>
On GFX9/GFX10 we can, so no need to make it complicated.<br>
<br>
- We use family_id + external_rev to distinguish between incompatible GPUs.<br>
PCI ID is not enough, as RAVEN and RAVEN2 have the same PCI device id,<br>
but different tiling. We might be able to find bigger equivalence<br>
groups for _X, but especially for DCC I would be uncomfortable making it<br>
shared between GPUs.<br>
<br>
- For DCN1 DCC, radeonsi currently uses another texelbuffer with indices<br>
to reorder. This is not shared.<br>
Specific to current implementation and does not need to be shared. To<br>
pave the way to shader-based solution, lets keep this internal to each<br>
driver. This should reduce the modifier churn if any of the driver<br>
implementations change. (Especially as you'd want to support the old<br>
implementation for a while to stay compatible with old kernels not<br>
supporting a new modifier yet).<br>
<br>
- No support for rotated swizzling.<br>
Can be added easily later and nothing in the stack would generate it<br>
currently.<br>
<br>
- Add extra enum values in the definitions.<br>
This way we can easily switch on modifier without having to pass around<br>
the current GPU everywhere, assuming the modifier has been validated.<br>
---<br>
<br>
Since my previous attempt for modifiers got bogged down on details for<br>
the GFX6-GFX8 modifiers in previous discussions, this only attempts to<br>
define modifiers for GFX9+, which is significantly simpler.<br>
<br>
For a final version I'd like to wait until I have written most of the<br>
userspace + kernelspace so we can actually test it. However, I'd<br>
appreciate any early feedback people are willing to give.<br>
<br>
Initial Mesa amd/common support + tests are available at<br>
<a href="https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/tree/modifiers" rel="noreferrer" target="_blank">https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/tree/modifiers</a><br>
<br>
I tested the HW to actually behave as described in the descriptions<br>
on Raven and plan to test on a subset of the others.<br>
<br>
include/uapi/drm/drm_fourcc.h | 118 ++++++++++++++++++++++++++++++++++<br>
1 file changed, 118 insertions(+)<br>
<br>
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h<br>
index 3feeaa3f987a..9bd286ab2bee 100644<br>
--- a/include/uapi/drm/drm_fourcc.h<br>
+++ b/include/uapi/drm/drm_fourcc.h<br>
@@ -756,6 +756,124 @@ extern "C" {<br>
*/<br>
#define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)<br>
<br>
+/*<br>
+ * AMD GFX9+ format modifiers<br>
+ */<br>
+<br>
+/*<br>
+ * enum-like values for easy switches.<br>
+ *<br>
+ * No fixed field-size but implementations are supposed to enforce all-zeros of<br>
+ * unused bits during validation.<br>
+ */<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id 0<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id 1<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id 2<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id 3<br>
+#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id 4<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id 5<br>
+#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id 6<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id 7<br>
+<br>
+/*<br>
+ * tiling modes that are compatible between all GPUs that support the tiling<br>
+ * mode.<br>
+ *<br>
+ * STANDARD/DISPLAY/ROTATED + bitdepth determine the indexing within a 256 byte<br>
+ * micro-block.<br>
+ *<br>
+ * The macro-block is 64 KiB and the micro-block in macro-block addressing is<br>
+ * y0-x0-y1-x1-... up till the dimensions of the macro-block.<br>
+ *<br>
+ * The image is then a plain row-major image of macro-blocks.<br>
+ */<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD \<br>
+ fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_STANDARD_id)<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY \<br>
+ fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_DISPLAY_id)<br>
+<br>
+/*<br>
+ * Same as above, but applies a transformation on the micro-block in macro-block<br>
+ * indexing that depends on the GPU pipes, shader engines and banks.<br>
+ *<br>
+ * RENDER is a new micro-block tiling for GFX10+.<br>
+ */<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD(family_id, external_rev) \<br>
+ fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_id | \<br>
+ ((uint64_t)family_id << 40) | \<br>
+ ((uint64_t)external_rev << 48))<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY(family_id, external_rev) \<br>
+ fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DISPLAY_id | \<br>
+ ((uint64_t)family_id << 40) | \<br>
+ ((uint64_t)external_rev << 48))<br>
+#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER(family_id, external_rev) \<br>
+ fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_id | \<br>
+ ((uint64_t)family_id << 40) | \<br>
+ ((uint64_t)external_rev << 48))<br>
+<br>
+/*<br>
+ * Same as above, but with DCC enabled.<br>
+ *<br>
+ * We add the PCI ID of the device to make sure the transformation above is<br>
+ * applied the same way, as well as make sure the implementation of DCC supports<br>
+ * the same patterns.<br>
+ *<br>
+ * The DCC is pipe-aligned (and on GFX9 rb-aligned).<br>
+ *<br>
+ * This includes 2 memory regions per plane:<br>
+ * - main image<br>
+ * - DCC metadata<br>
+ *<br>
+ * These are tightly packed according to platform specific DCC alignment<br>
+ * requirements.<br>
+ *<br>
+ * pipe+rb aligned DCC alignment:<br>
+ * - GFX9: MAX(65536,<br>
+ * MIN2(32, pipes * shader_engines) *<br>
+ * num_backends * interleave_bytes)<br>
+ * - GFX10 (without rbplus): MAX2(pipes * interleave_bytes, 4096)<br>
+ *<br>
+ * aligned DCC size:<br>
+ * - GFX9:<br>
+ * tiles of MAX2(256 * num_backends KiB, 1 MiB) of pixel data (prefer<br>
+ * width if odd log2) at ratio 1/256<br>
+ * - GFX10 (without rbplus):<br>
+ * tiles of 256 * MAX2(pipes * interleave_bytes, 4096) of pixel data<br>
+ * (prefer width if odd log2) at ratio 1/256<br>
+ */<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC(family_id, external_rev) \<br>
+ fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_STANDARD_DCC_id | \<br>
+ ((uint64_t)family_id << 40) | \<br>
+ ((uint64_t)external_rev << 48))<br>
+#define DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC(family_id, external_rev) \<br>
+ fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX10_64K_X_RENDER_DCC_id | \<br>
+ ((uint64_t)family_id << 40) | \<br>
+ ((uint64_t)external_rev << 48))<br>
+<br>
+/*<br>
+ * DCC that is displayable with DCN1 hardware.<br>
+ *<br>
+ * for bpp <= 32 bits, the micro-tiling is STANDARD and for bpp == 64 bits, the<br>
+ * micro-tiling is DISPLAY.<br>
+ *<br>
+ * This includes 3 memory regions per plane:<br>
+ * - main image<br>
+ * - DCC (non aligned)<br>
+ * - DCC (pipe-aligned & rb-aligned)<br>
+ *<br>
+ * non-aligned DCC alignment:<br>
+ * - GFX9: MAX(65536, interleave_bytes)<br>
+ * - GFX10 (without rbplus): 4096<br>
+ *<br>
+ * non-aligned DCC size:<br>
+ * - GFX9 & GFX10 (without rbplus):<br>
+ * tiles for 1 MiB of pixel data (prefer width if odd log2) at ratio 1/256<br>
+ */<br>
+#define DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC(family_id, external_rev) \<br>
+ fourcc_mod_code(AMD, DRM_FORMAT_MOD_AMD_GFX9_64K_X_DCN1_DCC_id | \<br>
+ ((uint64_t)family_id << 40) | \<br>
+ ((uint64_t)external_rev << 48))<br>
+<br>
#if defined(__cplusplus)<br>
}<br>
#endif<br>
-- <br>
2.23.0<br>
<br>
</blockquote></div></div>