[Mesa-dev] [PATCH 1/9] intel/blorp: Only double the fast-clear rect alignment on HSW

Mon Dec 3 22:48:01 UTC 2018

I've received confirmation from the HW team that the extra doubling is only
needed on Haswell GT3.

On Tue, May 15, 2018 at 5:28 PM Jason Ekstrand <jason at jlekstrand.net> wrote:

> The data in the commit message is a bit sketchy for Ivybridge.  We don't
> run dEQP or any of the CTSs on Ivybridge in CI so all the data we have
> is piglit.  On Haswell, piglit didn't catch anything so we don't have
> anything to go off of for Ivybridge besides the fact that the restriction
> wasn't added until Haswell.
> ---
>  src/intel/blorp/blorp_clear.c | 66
> ++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 56 insertions(+), 10 deletions(-)
>
> diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
> index 832e8ee..618625b 100644
> --- a/src/intel/blorp/blorp_clear.c
> +++ b/src/intel/blorp/blorp_clear.c
> @@ -235,16 +235,62 @@ get_fast_clear_rect(const struct isl_device *dev,
>        x_scaledown = x_align / 2;
>        y_scaledown = y_align / 2;
>
> -      /* From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel > Pixel
> -       * Backend > MCS Buffer for Render Target(s) [DevIVB+] > Table
> "Color
> -       * Clear of Non-MultiSampled Render Target Restrictions":
> -       *
> -       *   Clear rectangle must be aligned to two times the number of
> -       *   pixels in the table shown below due to 16x16 hashing across the
> -       *   slice.
> -       */
> -      x_align *= 2;
> -      y_align *= 2;
> +      if (ISL_DEV_IS_HASWELL(dev)) {
> +         /* The following text was added in the Haswell PRM, "3D Media
> GPGPU
> +          * Engine" >> "MCS Buffer for Render Target(s)" >> Table "Color
> Clear
> +          * of Non-MultiSampler Render Target Restrictions":
> +          *
> +          *    "Clear rectangle must be aligned to two times the number of
> +          *    pixels in the table shown below due to 16X16 hashing
> across the
> +          *    slice."
> +          *
> +          * It has persisted in the documentation for all platforms up
> until
> +          * Cannonlake and possibly even beyond.  However, we believe
> that it
> +          * is only needed on Haswell.
> +          *
> +          * There are a couple possible explanations for this restriction:
> +          *
> +          * 1) If you assume that the hardware is writing to the CCS as
> +          *    bytes, then the x/y_align computed above gives you an
> alignment
> +          *    in the CCS of 8x8 bytes and, if 16x16 is needed for
> hashing, we
> +          *    need to multiply by 2.
> +          *
> +          * 2) Haswell is a bit unique in that it's CCS tiling does not
> line
> +          *    up with Y-tiling on a cache-line granularity.  Instead, it
> has
> +          *    an extra bit of swizzling in bit 9.  Also, bit 6 swizzling
> +          *    applies to the CCS on Haswell.  This means that Haswell CTS
> +          *    does not match on a cache-line granularity but it does
> match on
> +          *    a 2x2 cache line granularity.
> +          *
> +          * Clearly, the first explanation seems to follow documentation
> the
> +          * best but they may be related.  In any case, empirical evidence
> +          * seems to confirm that it is, indeed required on Haswell.
> +          *
> +          * On Broadwell things get a bit stickier.  Broadwell adds
> support
> +          * for mip-mapped CCS with an alignment in the CCS of 256x128.
> For a
> +          * 32bpb main surface, the above computation will yield a
> x/y_align
> +          * of 128x128 for a Y-tiled main surface and 256x64 for
> X-tiled.  In
> +          * either case, if we double the alignment, we will get an
> alignment
> +          * bigger than horizontal and vertical alignment of the CCS and
> fast
> +          * clears of one LOD may leak into others.
> +          *
> +          * Starting with Skylake, the image alignment for the CCS is only
> +          * 128x64 which is exactly the x/h_align computed above if the
> main
> +          * surface has a 32bpb format.  Also, the "Render Target Resolve"
> +          * page in the bspec (not the PRM) says, "The Resolve Rectangle
> size
> +          * is same as Clear Rectangle size from SKL+".  The x/y_align
> +          * computed above (without doubling) match the resolve rectangle
> +          * calculation perfectly.
> +          *
> +          * Finally, to confirm all this, a full test run was performed on
> +          * Feb. 9, 2018 with this doubling removed and the only platform
> +          * which seemed to be affected was Haswell.  The run consisted of
> +          * piglit, dEQP, the Vulkan CTS 1.0.2, the OpenGL 4.5 CTS, and
> the
> +          * OpenGL ES 3.2 CTS.
> +          */
> +         x_align *= 2;
> +         y_align *= 2;
> +      }
>     } else {
>        assert(aux_surf->usage == ISL_SURF_USAGE_MCS_BIT);
>
> --
> 2.5.0.400.gff86faf
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20181203/8e79dbb0/attachment.html>