[Mesa-stable] [Mesa-dev] [PATCH] i965: Rework Sandy Bridge HiZ and stencil layouts

Pohjolainen, Topi topi.pohjolainen at gmail.com
Tue May 30 06:51:57 UTC 2017


On Mon, May 29, 2017 at 12:09:01PM -0700, Jason Ekstrand wrote:
> Sandy Bridge does not technically support mipmapped depth/stencil.  In
> order to work around this, we allocate what are effectively completely
> separate images for each miplevel, ensure that they are page-aligned,
> and manually offset to them.  Prior to layered rendering, this was a
> simple matter of setting a large enough halign/valign.
> 
> With the advent of layered rendering, however, things got more
> complicated.  Now, things weren't as simple as just handing a surface
> off to the hardware.  Any miplevel of a normally mipmapped surface can
> be considered as just an array surface given the right qpitch.  However,
> the hardware gives us no capability to specify qpitch so this won't
> work.  Instead, the chosen solution was to use a new "all slices at each
> LOD" layout which laid things out as a mipmap of arrays rather than an
> array of mipmaps.  This way you can easily offset to any of the
> miplevels and each is a valid array.
> 
> Unfortunately, the "all slices at each lod" concept missed one
> fundamental thing about SNB HiZ and stencil hardware:  It doesn't just
> always act as if you're always working with a non-mipmapped surface, it
> acts as if you're always working on a non-mipmapped surface of the same
> size as LOD0.  In other words, even though it may only write the
> upper-left corner of each array slice, the qpitch for the array is for a
> surface the size of LOD0 of the depth surface.  This mistake causes us
> to under-allocate HiZ and stencil in some cases and also to accidentally
> allow different miplevels to overlap.  Sadly, piglit test coverage
> didn't quite catch this until I started making changes to the resolve
> code that caused additional HiZ resolves in certain tests.
> 
> This commit switches Sandy Bridge HiZ and stencil over to a new scheme
> that lays out the non-zero miplevels horizontally below LOD0.  This way
> they can all have the same qpitch without interfering with each other.
> Technically, the miplevels still overlap, but things are spaced out
> enough that each page is only in the "written area" of one LOD.  Hopefully,
> this will get rid of at least some of the random SNB hangs.
> 
> Cc: "17.0 17.1" <mesa-stable at lists.freedesktop.org>
> Cc: Topi Pohjolainen <topi.pohjolainen at intel.com>
> Cc: Nanley Chery <nanley.g.chery at intel.com>
> Cc: Jordan Justen <jordan.l.justen at intel.com>
> Cc: Kenneth Graunke <kenneth at whitecape.org>

There is a few nits, but looks good. Thanks a lot for figuring out how it
really works:

Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>

> 
> ---
> The series I sent out on Friday suffered from a GPU hang or two on Sandy
> Bridge.  It turns out that those hangs were caused by the hardware HiZ
> resolving part of my batch buffer due to this under-allocation.
> 
> Topi, I'm sorry but this will likely make hash of your earlier patch
> series.  Sadly, I don't think there's really anything else we can do. :-(
> Also, given how tricky this is to get right, I concede that we may want to
> add an ISL_DIM_LAYOUT_GEN6_HIZ_STENCIL layout to ISL.  We could still do it
> with the "array of surfaces" approach but I think these sorts of
> calculations are best done in with the other surface calculation code.  I
> can help draft it if you'd like.
> 
>  src/mesa/drivers/dri/i965/brw_blorp.c         |  11 ++-
>  src/mesa/drivers/dri/i965/brw_tex_layout.c    | 100 ++++++++++++++++++++++----
>  src/mesa/drivers/dri/i965/gen6_depth_state.c  |   4 +-
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c |  11 +--
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h |  37 +++++++++-
>  5 files changed, 134 insertions(+), 29 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c b/src/mesa/drivers/dri/i965/brw_blorp.c
> index 6e860f0..cb9933b 100644
> --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> @@ -123,7 +123,7 @@ apply_gen6_stencil_hiz_offset(struct isl_surf *surf,
>                                uint32_t lod,
>                                uint32_t *offset)
>  {
> -   assert(mt->array_layout == ALL_SLICES_AT_EACH_LOD);
> +   assert(mt->array_layout == GEN6_HIZ_STENCIL);
>  
>     if (mt->format == MESA_FORMAT_S_UINT8) {
>        /* Note: we can't compute the stencil offset using
> @@ -183,12 +183,12 @@ blorp_surf_for_miptree(struct brw_context *brw,
>     };
>  
>     if (brw->gen == 6 && mt->format == MESA_FORMAT_S_UINT8 &&
> -       mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
> -      /* Sandy bridge stencil and HiZ use this ALL_SLICES_AT_EACH_LOD hack in
> +       mt->array_layout == GEN6_HIZ_STENCIL) {
> +      /* Sandy bridge stencil and HiZ use this GEN6_HIZ_STENCIL hack in
>         * order to allow for layered rendering.  The hack makes each LOD of the
>         * stencil or HiZ buffer a single tightly packed array surface at some
>         * offset into the surface.  Since ISL doesn't know how to deal with the
> -       * crazy ALL_SLICES_AT_EACH_LOD layout and since we have to do a manual
> +       * crazy GEN6_HIZ_STENCIL layout and since we have to do a manual
>         * offset of it anyway, we might as well do the offset here and keep the
>         * hacks inside the i965 driver.
>         *
> @@ -241,8 +241,7 @@ blorp_surf_for_miptree(struct brw_context *brw,
>  
>           struct intel_mipmap_tree *hiz_mt = mt->hiz_buf->mt;
>           if (hiz_mt) {
> -            assert(brw->gen == 6 &&
> -                   hiz_mt->array_layout == ALL_SLICES_AT_EACH_LOD);
> +            assert(brw->gen == 6 && hiz_mt->array_layout == GEN6_HIZ_STENCIL);
>  
>              /* gen6 requires the HiZ buffer to be manually offset to the
>               * right location.  We could fixup the surf but it doesn't
> diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> index bfa8afa..30e6233 100644
> --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> @@ -216,6 +216,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
>        mt->total_height = MAX2(mt->total_height, y + img_height);
>  
>        /* Layout_below: step right after second mipmap.
> +       *
> +       * For Sandy Bridge HiZ and stencil, we always step down.
>         */
>        if (level == mt->first_level + 1) {
>  	 x += ALIGN_NPOT(width, mt->halign) / bw;
> @@ -231,6 +233,67 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
>     }
>  }
>  
> +static void
> +brw_miptree_layout_gen6_hiz_stencil(struct intel_mipmap_tree *mt)
> +{
> +   unsigned x = 0;
> +   unsigned y = 0;
> +   unsigned width = mt->physical_width0;
> +   unsigned height = mt->physical_height0;
> +   /* Number of layers of array texture. */
> +   unsigned depth = mt->physical_depth0;

We could mark height and depth const, only width is variant. I was also
thinking if we want to comment anything about needing to use also level zero
depth for all subsequent levels in case of 3D textures (instead of minifying
depth).

> +   unsigned tile_width, tile_height, bw, bh;
> +
> +   if (mt->format == MESA_FORMAT_S_UINT8) {
> +      bw = bh = 1;
> +      /* W-tiled */
> +      tile_width = 64;
> +      tile_height = 64;
> +   } else {
> +      assert(_mesa_get_format_base_format(mt->format) == GL_DEPTH_COMPONENT ||
> +             _mesa_get_format_base_format(mt->format) == GL_DEPTH_STENCIL);
> +      /* Each 128-bit HiZ block corresponds to a region of of 8x4 depth
> +       * samples.  Each cache line in the Y-Tiled HiZ image contains 2x2 HiZ
> +       * blocks.  Therefore, each Y-tiled cache line corresponds to an 16x8
> +       * region in the depth surface.  Since we're representing it as
> +       * RGBA_FLOAT32, the miptree calculations will think that each cache
> +       * line is 1x4 pixels.  Therefore, we need a scale-down factor of 16x2
> +       * and a vertical alignment of 2.
> +       */
> +      mt->cpp = 16;
> +      bw = 16;
> +      bh = 2;
> +      /* Y-tiled */
> +      tile_width = 128 / mt->cpp;
> +      tile_height = 32;
> +   }
> +
> +   mt->total_width = 0;
> +   mt->total_height = 0;

We could pull height here as it is invariant in the loop:

      const unsigned img_height =
         ALIGN(DIV_ROUND_UP(height, bh), mt->valign) * depth;

> +
> +   for (unsigned level = mt->first_level; level <= mt->last_level; level++) {
> +      intel_miptree_set_level_info(mt, level, x, y, depth);
> +
> +      const unsigned img_width = ALIGN(DIV_ROUND_UP(width, bw), mt->halign);
> +
> +      mt->total_width = MAX2(mt->total_width, x + img_width);
> +      mt->total_height = MAX2(mt->total_height, y + img_height);
> +
> +      if (level == mt->first_level) {
> +         y += ALIGN(img_height, tile_height);
> +      } else {
> +         x += ALIGN(img_width, tile_width);
> +      }
> +
> +      /* We only minify the width.  We want qpitch to match for all miplevels
> +       * because the hardware doesn't know we aren't on LOD0.
> +       */
> +      width = minify(width, 1);
> +   }
> +}
> +
>  unsigned
>  brw_miptree_get_horizontal_slice_pitch(const struct brw_context *brw,
>                                         const struct intel_mipmap_tree *mt,
> @@ -249,6 +312,8 @@ brw_miptree_get_vertical_slice_pitch(const struct brw_context *brw,
>                                       const struct intel_mipmap_tree *mt,
>                                       unsigned level)
>  {
> +   assert(mt->array_layout != GEN6_HIZ_STENCIL || brw->gen == 6);
> +
>     if (brw->gen >= 9) {
>        /* ALL_SLICES_AT_EACH_LOD isn't supported on Gen8+ but this code will
>         * effectively end up with a packed qpitch anyway whenever
> @@ -281,6 +346,15 @@ brw_miptree_get_vertical_slice_pitch(const struct brw_context *brw,
>                mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
>        return ALIGN_NPOT(minify(mt->physical_height0, level), mt->valign);
>  
> +   } else if (mt->array_layout == GEN6_HIZ_STENCIL) {
> +      /* For HiZ and stencil on Sandy Bridge, we don't minify the height. */
> +      if (mt->format == MESA_FORMAT_S_UINT8) {
> +         return ALIGN(mt->physical_height0, mt->valign);
> +      } else {
> +         /* HiZ has a vertical scale factor of 2. */
> +         return ALIGN(DIV_ROUND_UP(mt->physical_height0, 2), mt->halign);
> +      }
> +
>     } else {
>        const unsigned h0 = ALIGN_NPOT(mt->physical_height0, mt->valign);
>        const unsigned h1 = ALIGN_NPOT(minify(mt->physical_height0, 1), mt->valign);
> @@ -333,6 +407,8 @@ brw_miptree_layout_texture_array(struct brw_context *brw,
>  
>     if (layout_1d)
>        gen9_miptree_layout_1d(mt);
> +   else if (mt->array_layout == GEN6_HIZ_STENCIL)
> +      brw_miptree_layout_gen6_hiz_stencil(mt);
>     else
>        brw_miptree_layout_2d(mt);
>  
> @@ -556,6 +632,8 @@ intel_miptree_set_total_width_height(struct brw_context *brw,
>        case INTEL_MSAA_LAYOUT_IMS:
>           if (gen9_use_linear_1d_layout(brw, mt))
>              gen9_miptree_layout_1d(mt);
> +         else if (mt->array_layout == GEN6_HIZ_STENCIL)
> +            brw_miptree_layout_gen6_hiz_stencil(mt);
>           else
>              brw_miptree_layout_2d(mt);
>           break;
> @@ -579,15 +657,9 @@ intel_miptree_set_alignment(struct brw_context *brw,
>      * - Ironlake and Sandybridge PRMs: Volume 1, Part 1, Section 7.18.3.4
>      * - BSpec (for Ivybridge and slight variations in separate stencil)
>      */
> -   bool gen6_hiz_or_stencil = false;
>  
> -   if (brw->gen == 6 && mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
> -      const GLenum base_format = _mesa_get_format_base_format(mt->format);
> -      gen6_hiz_or_stencil = _mesa_is_depth_or_stencil_format(base_format);
> -   }
> -
> -   if (gen6_hiz_or_stencil) {
> -      /* On gen6, we use ALL_SLICES_AT_EACH_LOD for stencil/hiz because the
> +   if (mt->array_layout == GEN6_HIZ_STENCIL) {
> +      /* On gen6, we use GEN6_HIZ_STENCIL for stencil/hiz because the
>         * hardware doesn't support multiple mip levels on stencil/hiz.
>         *
>         * PRM Vol 2, Part 1, 7.5.3 Hierarchical Depth Buffer:
> @@ -600,15 +672,13 @@ intel_miptree_set_alignment(struct brw_context *brw,
>           /* Stencil uses W tiling, so we force W tiling alignment for the
>            * ALL_SLICES_AT_EACH_LOD miptree layout.
>            */
> -         mt->halign = 64;
> -         mt->valign = 64;
> +         mt->halign = 4;
> +         mt->valign = 2;
>           assert((layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0);
>        } else {
> -         /* Depth uses Y tiling, so we force need Y tiling alignment for the
> -          * ALL_SLICES_AT_EACH_LOD miptree layout.
> -          */
> -         mt->halign = 128 / mt->cpp;
> -         mt->valign = 32;
> +         /* See intel_hiz_miptree_buf_create() */

There isnt't really anything about halign/valign in
intel_hiz_miptree_buf_create()? Should we refer to
brw_miptree_layout_gen6_hiz_stencil() instead?

> +         mt->halign = 1;
> +         mt->valign = 2;
>        }
>     } else if (mt->compressed) {
>         /* The hardware alignment requirements for compressed textures
> diff --git a/src/mesa/drivers/dri/i965/gen6_depth_state.c b/src/mesa/drivers/dri/i965/gen6_depth_state.c
> index ae4f681..20992d5 100644
> --- a/src/mesa/drivers/dri/i965/gen6_depth_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_depth_state.c
> @@ -164,7 +164,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw,
>           struct intel_mipmap_tree *hiz_mt = depth_mt->hiz_buf->mt;
>           uint32_t offset = 0;
>  
> -         if (hiz_mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
> +         if (hiz_mt->array_layout == GEN6_HIZ_STENCIL) {
>              offset = intel_miptree_get_aligned_offset(
>                          hiz_mt,
>                          hiz_mt->level[lod].level_x,
> @@ -190,7 +190,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw,
>        if (separate_stencil) {
>           uint32_t offset = 0;
>  
> -         if (stencil_mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
> +         if (stencil_mt->array_layout == GEN6_HIZ_STENCIL) {
>              assert(stencil_mt->format == MESA_FORMAT_S_UINT8);
>  
>              /* Note: we can't compute the stencil offset using
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index e334951..9f9e68a 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -452,7 +452,7 @@ intel_miptree_create_layout(struct brw_context *brw,
>           intel_miptree_wants_hiz_buffer(brw, mt)))) {
>        uint32_t stencil_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD;
>        if (brw->gen == 6) {
> -         stencil_flags |= MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD |
> +         stencil_flags |= MIPTREE_LAYOUT_GEN6_HIZ_STENCIL |
>                            MIPTREE_LAYOUT_TILING_ANY;
>        }
>  
> @@ -485,8 +485,8 @@ intel_miptree_create_layout(struct brw_context *brw,
>        }
>     }
>  
> -   if (layout_flags & MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD)
> -      mt->array_layout = ALL_SLICES_AT_EACH_LOD;
> +   if (layout_flags & MIPTREE_LAYOUT_GEN6_HIZ_STENCIL)
> +      mt->array_layout = GEN6_HIZ_STENCIL;
>  
>     /*
>      * Obey HALIGN_16 constraints for Gen8 and Gen9 buffers which are
> @@ -1830,7 +1830,7 @@ intel_hiz_miptree_buf_create(struct brw_context *brw,
>     uint32_t layout_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD;
>  
>     if (brw->gen == 6)
> -      layout_flags |= MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD;
> +      layout_flags |= MIPTREE_LAYOUT_GEN6_HIZ_STENCIL;
>  
>     if (!buf)
>        return NULL;
> @@ -2770,7 +2770,7 @@ intel_update_r8stencil(struct brw_context *brw,
>        const uint32_t r8stencil_flags =
>           MIPTREE_LAYOUT_ACCELERATED_UPLOAD | MIPTREE_LAYOUT_TILING_Y |
>           MIPTREE_LAYOUT_DISABLE_AUX;
> -      assert(brw->gen > 6); /* Handle MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD */
> +      assert(brw->gen > 6); /* Handle MIPTREE_LAYOUT_GEN6_HIZ_STENCIL */
>        mt->r8stencil_mt = intel_miptree_create(brw,
>                                                src->target,
>                                                MESA_FORMAT_R_UINT8,
> @@ -3670,6 +3670,7 @@ intel_miptree_get_isl_surf(struct brw_context *brw,
>        surf->array_pitch_span = ISL_ARRAY_PITCH_SPAN_FULL;
>        break;
>     case ALL_SLICES_AT_EACH_LOD:
> +   case GEN6_HIZ_STENCIL:
>        surf->array_pitch_span = ISL_ARRAY_PITCH_SPAN_COMPACT;
>        break;
>     default:
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> index aa52f48..e2ec26f 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> @@ -250,6 +250,41 @@ enum miptree_array_layout {
>      *   +---+
>      */
>     ALL_SLICES_AT_EACH_LOD,
> +
> +   /* On Sandy Bridge, HiZ and stencil buffers work the same as on Ivy Bridge
> +    * except that they don't technically support mipmapping.  That does not,
> +    * however, stop us from doing it.  As far as Sandy Bridge hardware is
> +    * concerned, HiZ and stencil always operates on a single miplevel 2D
> +    * (possibly array) image.  The dimensions of that image are NOT minified.
> +    *
> +    * In order to implement HiZ and stencil on Sandy Bridge, we create one
> +    * full-sized 2D (possibly array) image for every LOD with every image
> +    * aligned to a page boundary.  In order to save memory, we pretend that
> +    * the width of each miplevel is minified and we place LOD1 and above below
> +    * LOD0 but horizontally adjacent to each other.  When considered as
> +    * full-sized images, LOD1 and above technically overlap.  However, since
> +    * we only write to part of that image, the hardware will never notice the
> +    * overlap.
> +    *
> +    * This layout looks something like this:
> +    *
> +    *   +---------+
> +    *   |         |
> +    *   |         |
> +    *   +---------+
> +    *   |         |
> +    *   |         |
> +    *   +---------+
> +    *
> +    *   +----+ +-+ .
> +    *   |    | +-+
> +    *   +----+
> +    *
> +    *   +----+ +-+ .
> +    *   |    | +-+
> +    *   +----+

If we want to reflect all levels having the same heigth, maybe:

           +---------+
           |         |
           |         |
           +---------+
           |         |
           |         |
           +---------+

           +----+ +-+ .
           |    | | |
           |    | | |
           +----+ +-+

           +----+ +-+ .
           |    | | |
           |    | | |
           +----+ +-+

> +    */
> +   GEN6_HIZ_STENCIL,
>  };
>  
>  enum intel_aux_disable {
> @@ -637,7 +672,7 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context *brw,
>  
>  enum {
>     MIPTREE_LAYOUT_ACCELERATED_UPLOAD       = 1 << 0,
> -   MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD   = 1 << 1,
> +   MIPTREE_LAYOUT_GEN6_HIZ_STENCIL         = 1 << 1,
>     MIPTREE_LAYOUT_FOR_BO                   = 1 << 2,
>     MIPTREE_LAYOUT_DISABLE_AUX              = 1 << 3,
>     MIPTREE_LAYOUT_FORCE_HALIGN16           = 1 << 4,
> -- 
> 2.5.0.400.gff86faf
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-stable mailing list