[Mesa-dev] [PATCH 1/9] intel/blorp: Only double the fast-clear rect alignment on HSW

Tue May 15 22:28:04 UTC 2018

The data in the commit message is a bit sketchy for Ivybridge.  We don't
run dEQP or any of the CTSs on Ivybridge in CI so all the data we have
is piglit.  On Haswell, piglit didn't catch anything so we don't have
anything to go off of for Ivybridge besides the fact that the restriction
wasn't added until Haswell.
---
 src/intel/blorp/blorp_clear.c | 66 ++++++++++++++++++++++++++++++++++++-------
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index 832e8ee..618625b 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -235,16 +235,62 @@ get_fast_clear_rect(const struct isl_device *dev,
       x_scaledown = x_align / 2;
       y_scaledown = y_align / 2;
 
-      /* From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel > Pixel
-       * Backend > MCS Buffer for Render Target(s) [DevIVB+] > Table "Color
-       * Clear of Non-MultiSampled Render Target Restrictions":
-       *
-       *   Clear rectangle must be aligned to two times the number of
-       *   pixels in the table shown below due to 16x16 hashing across the
-       *   slice.
-       */
-      x_align *= 2;
-      y_align *= 2;
+      if (ISL_DEV_IS_HASWELL(dev)) {
+         /* The following text was added in the Haswell PRM, "3D Media GPGPU
+          * Engine" >> "MCS Buffer for Render Target(s)" >> Table "Color Clear
+          * of Non-MultiSampler Render Target Restrictions":
+          *
+          *    "Clear rectangle must be aligned to two times the number of
+          *    pixels in the table shown below due to 16X16 hashing across the
+          *    slice."
+          *
+          * It has persisted in the documentation for all platforms up until
+          * Cannonlake and possibly even beyond.  However, we believe that it
+          * is only needed on Haswell.
+          *
+          * There are a couple possible explanations for this restriction:
+          *
+          * 1) If you assume that the hardware is writing to the CCS as
+          *    bytes, then the x/y_align computed above gives you an alignment
+          *    in the CCS of 8x8 bytes and, if 16x16 is needed for hashing, we
+          *    need to multiply by 2.
+          *
+          * 2) Haswell is a bit unique in that it's CCS tiling does not line
+          *    up with Y-tiling on a cache-line granularity.  Instead, it has
+          *    an extra bit of swizzling in bit 9.  Also, bit 6 swizzling
+          *    applies to the CCS on Haswell.  This means that Haswell CTS
+          *    does not match on a cache-line granularity but it does match on
+          *    a 2x2 cache line granularity.
+          *
+          * Clearly, the first explanation seems to follow documentation the
+          * best but they may be related.  In any case, empirical evidence
+          * seems to confirm that it is, indeed required on Haswell.
+          *
+          * On Broadwell things get a bit stickier.  Broadwell adds support
+          * for mip-mapped CCS with an alignment in the CCS of 256x128.  For a
+          * 32bpb main surface, the above computation will yield a x/y_align
+          * of 128x128 for a Y-tiled main surface and 256x64 for X-tiled.  In
+          * either case, if we double the alignment, we will get an alignment
+          * bigger than horizontal and vertical alignment of the CCS and fast
+          * clears of one LOD may leak into others.
+          *
+          * Starting with Skylake, the image alignment for the CCS is only
+          * 128x64 which is exactly the x/h_align computed above if the main
+          * surface has a 32bpb format.  Also, the "Render Target Resolve"
+          * page in the bspec (not the PRM) says, "The Resolve Rectangle size
+          * is same as Clear Rectangle size from SKL+".  The x/y_align
+          * computed above (without doubling) match the resolve rectangle
+          * calculation perfectly.
+          *
+          * Finally, to confirm all this, a full test run was performed on
+          * Feb. 9, 2018 with this doubling removed and the only platform
+          * which seemed to be affected was Haswell.  The run consisted of
+          * piglit, dEQP, the Vulkan CTS 1.0.2, the OpenGL 4.5 CTS, and the
+          * OpenGL ES 3.2 CTS.
+          */
+         x_align *= 2;
+         y_align *= 2;
+      }
    } else {
       assert(aux_surf->usage == ISL_SURF_USAGE_MCS_BIT);
 
-- 
2.5.0.400.gff86faf