<div dir="ltr"><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Thu, Jul 3, 2025 at 6:34 PM James Jones <<a href="mailto:jajones@nvidia.com">jajones@nvidia.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">The layout of bits within the individual tiles<br>
(referred to as sectors in the<br>
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro)<br>
changed for some formats starting in Blackwell 2<br>
GPUs. To denote the difference, extend the sector<br>
field in the parametric format modifier definition<br>
used to generate modifier values for NVIDIA<br>
hardware.<br>
<br>
Without this change, it would be impossible to<br>
differentiate the two layouts based on modifiers,<br>
and as a result software could attempt to share<br>
surfaces directly between pre-GB20x and GB20x<br>
cards, resulting in corruption when the surface<br>
was accessed on one of the GPUs after being<br>
populated with content by the other.<br>
<br>
Of note: This change causes the<br>
DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D() macro to<br>
evaluate its "s" parameter twice, with the side<br>
effects that entails. I surveyed all usage of the<br>
modifier in the kernel and Mesa code, and that<br>
does not appear to be problematic in any current<br>
usage, but I thought it was worth calling out.<br>
<br>
Signed-off-by: James Jones <<a href="mailto:jajones@nvidia.com" target="_blank">jajones@nvidia.com</a>><br>
---<br>
include/uapi/drm/drm_fourcc.h | 46 +++++++++++++++++++++--------------<br>
1 file changed, 28 insertions(+), 18 deletions(-)<br>
<br>
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h<br>
index 052e5fdd1d3b..348b2f1c1cb7 100644<br>
--- a/include/uapi/drm/drm_fourcc.h<br>
+++ b/include/uapi/drm/drm_fourcc.h<br>
@@ -909,8 +909,10 @@ extern "C" {<br>
#define __fourcc_mod_nvidia_pkind_shift 12<br>
#define __fourcc_mod_nvidia_kgen_mask 0x3ULL<br>
#define __fourcc_mod_nvidia_kgen_shift 20<br>
-#define __fourcc_mod_nvidia_slayout_mask 0x1ULL<br>
-#define __fourcc_mod_nvidia_slayout_shift 22<br>
+#define __fourcc_mod_nvidia_slayout_low_mask 0x1ULL<br>
+#define __fourcc_mod_nvidia_slayout_low_shift 22<br>
+#define __fourcc_mod_nvidia_slayout_high_mask 0x2ULL<br>
+#define __fourcc_mod_nvidia_slayout_high_shift 25<br>
#define __fourcc_mod_nvidia_comp_mask 0x7ULL<br>
#define __fourcc_mod_nvidia_comp_shift 23<br>
<br>
@@ -973,14 +975,16 @@ extern "C" {<br>
* 2 = Gob Height 8, Turing+ Page Kind mapping<br>
* 3 = Reserved for future use.<br>
*<br>
- * 22:22 s Sector layout. On Tegra GPUs prior to Xavier, there is a further<br>
- * bit remapping step that occurs at an even lower level than the<br>
- * page kind and block linear swizzles. This causes the layout of<br>
- * surfaces mapped in those SOC's GPUs to be incompatible with the<br>
- * equivalent mapping on other GPUs in the same system.<br>
+ * 22:22 s Sector layout. There is a further bit remapping step that occurs<br>
+ * 26:26 at an even lower level than the page kind and block linear<br>
+ * swizzles. This causes the bit arrangement of surfaces in memory<br>
+ * to differ subtly, and prevents direct sharing of surfaces between<br>
+ * GPUs with different layouts.<br>
*<br>
- * 0 = Tegra K1 - Tegra Parker/TX2 Layout.<br>
- * 1 = Desktop GPU and Tegra Xavier+ Layout<br>
+ * 0 = Tegra K1 - Tegra Parker/TX2 Layout<br>
+ * 1 = Pre-GB20x, Tegra Xavier-Orin, GB10 Layout<br>
+ * 2 = GB20x(Blackwell 2)+ Layout for some pixel/texel sizes<br></blockquote><div><br></div><div>I'm not sure I like just lumping all of blackwell together. Blackwell is the same as Turing for 32, 64, and 128-bit formats. It's only different on 8 and 16 and those aren't the same. The way we modeled this for NVK is to have Turing, Blackwell8Bit, and Blackwell16Bit GOBTypes. I think I'd prefer the modifiers take a similar form.</div><div><br></div><div>Technically, this isn't strictly necessary as there is always a format and formats with different element sizes aren't compatible so a driver can always look at format+modifier. However, it is a better model of the hardware.</div><div><br></div><div>~Faith Ekstrand</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
+ * 3 = reserved for future use.<br>
*<br>
* 25:23 c Lossless Framebuffer Compression type.<br>
*<br>
@@ -995,7 +999,7 @@ extern "C" {<br>
* 6 = Reserved for future use<br>
* 7 = Reserved for future use<br>
*<br>
- * 55:26 - Reserved for future use. Must be zero.<br>
+ * 55:27 - Reserved for future use. Must be zero.<br>
*/<br>
#define DRM_FORMAT_MOD_NVIDIA_BLOCK_LINEAR_2D(c, s, g, k, h) \<br>
fourcc_mod_code(NVIDIA, \<br>
@@ -1006,8 +1010,10 @@ extern "C" {<br>
__fourcc_mod_nvidia_pkind_shift) | \<br>
(((g) & __fourcc_mod_nvidia_kgen_mask) << \<br>
__fourcc_mod_nvidia_kgen_shift) | \<br>
- (((s) & __fourcc_mod_nvidia_slayout_mask) << \<br>
- __fourcc_mod_nvidia_slayout_shift) | \<br>
+ (((s) & __fourcc_mod_nvidia_slayout_low_mask) << \<br>
+ __fourcc_mod_nvidia_slayout_low_shift) | \<br>
+ (((s) & __fourcc_mod_nvidia_slayout_high_mask) << \<br>
+ __fourcc_mod_nvidia_slayout_high_shift) | \<br>
(((c) & __fourcc_mod_nvidia_comp_mask) << \<br>
__fourcc_mod_nvidia_comp_shift)))<br>
<br>
@@ -1037,12 +1043,6 @@ __DRM_FOURCC_MKNVHELPER_FUNC(pkind)<br>
*/<br>
__DRM_FOURCC_MKNVHELPER_FUNC(kgen)<br>
<br>
-/*<br>
- * Get the sector layout specified by mod:<br>
- * static inline __u64 drm_fourcc_nvidia_format_mod_slayout(__u64 mod)<br>
- */<br>
-__DRM_FOURCC_MKNVHELPER_FUNC(slayout)<br>
-<br>
/*<br>
* Get the lossless framebuffer compression specified by mod:<br>
* static inline __u64 drm_fourcc_nvidia_format_mod_kgen(__u64 mod)<br>
@@ -1051,6 +1051,16 @@ __DRM_FOURCC_MKNVHELPER_FUNC(comp)<br>
<br>
#undef __DRM_FOURCC_MKNVHELPER_FUNC<br>
<br>
+/* Get the sector layout specified by mod: */<br>
+static inline __u64<br>
+drm_fourcc_nvidia_format_mod_slayout(__u64 mod)<br>
+{<br>
+ return ((mod >> __fourcc_mod_nvidia_slayout_low_shift) &<br>
+ __fourcc_mod_nvidia_slayout_low_mask) |<br>
+ ((mod >> __fourcc_mod_nvidia_slayout_high_shift) &<br>
+ __fourcc_mod_nvidia_slayout_high_mask);<br>
+}<br>
+<br>
/* To grandfather in prior block linear format modifiers to the above layout,<br>
* the page kind "0", which corresponds to "pitch/linear" and hence is unusable<br>
* with block-linear layouts, is remapped within drivers to the value 0xfe,<br>
-- <br>
2.49.0<br>
<br>
</blockquote></div></div>