[Mesa-dev] [PATCH] intel/blorp: Only double the fast-clear rect alignment on HSW

Sat Feb 10 17:48:44 UTC 2018

---
 src/intel/blorp/blorp_clear.c | 66 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index a2dbcd1..63b74e3 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -235,16 +235,62 @@ get_fast_clear_rect(const struct isl_device *dev,
       x_scaledown = x_align / 2;
       y_scaledown = y_align / 2;
 
-      /* From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel > Pixel
-       * Backend > MCS Buffer for Render Target(s) [DevIVB+] > Table "Color
-       * Clear of Non-MultiSampled Render Target Restrictions":
-       *
-       *   Clear rectangle must be aligned to two times the number of
-       *   pixels in the table shown below due to 16x16 hashing across the
-       *   slice.
-       */
-      x_align *= 2;
-      y_align *= 2;
+      if (ISL_DEV_IS_HASWELL(dev)) {
+         /* The following text was added in the Haswell PRM, "3D Media GPGPU
+          * Engine" >> "MCS Buffer for Render Target(s)" >> Table "Color Clear
+          * of Non-MultiSampler Render Target Restrictions":
+          *
+          *    "Clear rectangle must be aligned to two times the number of
+          *    pixels in the table shown below due to 16X16 hashing across the
+          *    slice."
+          *
+          * It has persisted in the documentation for all platforms up until
+          * Cannonlake and possibly even beyond.  However, we believe that it
+          * is only needed on Haswell.
+          *
+          * There are a couple possible explanations for this restriction:
+          *
+          * 1) If you assume that the hardware is writing to the CCS as
+          *    bytes, then the x/y_align computed above gives you an alignment
+          *    in the CCS of 8x8 bytes and, if 16x16 is needed for hashing, we
+          *    need to multiply by 2.
+          *
+          * 2) Haswell is a bit unique in that it's CCS tiling does not line
+          *    up with Y-tiling on a cache-line granularity.  Instead, it has
+          *    an extra bit of swizzling in bit 9.  Also, bit 6 swizzling
+          *    applies to the CCS on Haswell.  This means that Haswell CTS
+          *    does not match on a cache-line granularity but it does match on
+          *    a 2x2 cache line granularity.
+          *
+          * Clearly, the first explanation seems to follow documentation the
+          * best but they may be related.  In any case, empirical evidence
+          * seems to confirm that it is, indeed required on Haswell.
+          *
+          * On Broadwell things get a bit stickier.  Broadwell adds support
+          * for mip-mapped CCS with an alignment in the CCS of 256x128.  For a
+          * 32bpb main surface, the above computation will yield a x/y_align
+          * of 128x128 for a Y-tiled main surface and 256x64 for X-tiled.  In
+          * either case, if we double the alignment, we will get an alignment
+          * bigger than horizontal and vertical alignment of the CCS and fast
+          * clears of one LOD may leak into others.
+          *
+          * Starting with Skylake, the image alignment for the CCS is only
+          * 128x64 which is exactly the x/h_align computed above if the main
+          * surface has a 32bpb format.  Also, the "Render Target Resolve"
+          * page in the bspec (not the PRM) says, "The Resolve Rectangle size
+          * is same as Clear Rectangle size from SKL+".  The x/y_align
+          * computed above (without doubling) match the resolve rectangle
+          * calculation perfectly.
+          *
+          * Finally, to confirm all this, a full test run was performed on
+          * Feb. 9, 2018 with this doubling removed and the only platform
+          * which seemed to be affected was Haswell.  The run consisted of
+          * piglit, dEQP, the Vulkan CTS 1.0.2, the OpenGL 4.5 CTS, and the
+          * OpenGL ES 3.2 CTS.
+          */
+         x_align *= 2;
+         y_align *= 2;
+      }
    } else {
       assert(aux_surf->usage == ISL_SURF_USAGE_MCS_BIT);
 
-- 
2.5.0.400.gff86faf