[Mesa-dev] [PATCH] i965: Rework Sandy Bridge HiZ and stencil layouts
Jason Ekstrand
jason at jlekstrand.net
Tue May 30 15:50:01 UTC 2017
On Mon, May 29, 2017 at 11:51 PM, Pohjolainen, Topi <
topi.pohjolainen at gmail.com> wrote:
> On Mon, May 29, 2017 at 12:09:01PM -0700, Jason Ekstrand wrote:
> > Sandy Bridge does not technically support mipmapped depth/stencil. In
> > order to work around this, we allocate what are effectively completely
> > separate images for each miplevel, ensure that they are page-aligned,
> > and manually offset to them. Prior to layered rendering, this was a
> > simple matter of setting a large enough halign/valign.
> >
> > With the advent of layered rendering, however, things got more
> > complicated. Now, things weren't as simple as just handing a surface
> > off to the hardware. Any miplevel of a normally mipmapped surface can
> > be considered as just an array surface given the right qpitch. However,
> > the hardware gives us no capability to specify qpitch so this won't
> > work. Instead, the chosen solution was to use a new "all slices at each
> > LOD" layout which laid things out as a mipmap of arrays rather than an
> > array of mipmaps. This way you can easily offset to any of the
> > miplevels and each is a valid array.
> >
> > Unfortunately, the "all slices at each lod" concept missed one
> > fundamental thing about SNB HiZ and stencil hardware: It doesn't just
> > always act as if you're always working with a non-mipmapped surface, it
> > acts as if you're always working on a non-mipmapped surface of the same
> > size as LOD0. In other words, even though it may only write the
> > upper-left corner of each array slice, the qpitch for the array is for a
> > surface the size of LOD0 of the depth surface. This mistake causes us
> > to under-allocate HiZ and stencil in some cases and also to accidentally
> > allow different miplevels to overlap. Sadly, piglit test coverage
> > didn't quite catch this until I started making changes to the resolve
> > code that caused additional HiZ resolves in certain tests.
> >
> > This commit switches Sandy Bridge HiZ and stencil over to a new scheme
> > that lays out the non-zero miplevels horizontally below LOD0. This way
> > they can all have the same qpitch without interfering with each other.
> > Technically, the miplevels still overlap, but things are spaced out
> > enough that each page is only in the "written area" of one LOD.
> Hopefully,
> > this will get rid of at least some of the random SNB hangs.
> >
> > Cc: "17.0 17.1" <mesa-stable at lists.freedesktop.org>
> > Cc: Topi Pohjolainen <topi.pohjolainen at intel.com>
> > Cc: Nanley Chery <nanley.g.chery at intel.com>
> > Cc: Jordan Justen <jordan.l.justen at intel.com>
> > Cc: Kenneth Graunke <kenneth at whitecape.org>
>
> There is a few nits, but looks good. Thanks a lot for figuring out how it
> really works:
>
> Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
>
> >
> > ---
> > The series I sent out on Friday suffered from a GPU hang or two on Sandy
> > Bridge. It turns out that those hangs were caused by the hardware HiZ
> > resolving part of my batch buffer due to this under-allocation.
> >
> > Topi, I'm sorry but this will likely make hash of your earlier patch
> > series. Sadly, I don't think there's really anything else we can do. :-(
> > Also, given how tricky this is to get right, I concede that we may want
> to
> > add an ISL_DIM_LAYOUT_GEN6_HIZ_STENCIL layout to ISL. We could still
> do it
> > with the "array of surfaces" approach but I think these sorts of
> > calculations are best done in with the other surface calculation code. I
> > can help draft it if you'd like.
> >
> > src/mesa/drivers/dri/i965/brw_blorp.c | 11 ++-
> > src/mesa/drivers/dri/i965/brw_tex_layout.c | 100
> ++++++++++++++++++++++----
> > src/mesa/drivers/dri/i965/gen6_depth_state.c | 4 +-
> > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 11 +--
> > src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 37 +++++++++-
> > 5 files changed, 134 insertions(+), 29 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c
> b/src/mesa/drivers/dri/i965/brw_blorp.c
> > index 6e860f0..cb9933b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_blorp.c
> > +++ b/src/mesa/drivers/dri/i965/brw_blorp.c
> > @@ -123,7 +123,7 @@ apply_gen6_stencil_hiz_offset(struct isl_surf *surf,
> > uint32_t lod,
> > uint32_t *offset)
> > {
> > - assert(mt->array_layout == ALL_SLICES_AT_EACH_LOD);
> > + assert(mt->array_layout == GEN6_HIZ_STENCIL);
> >
> > if (mt->format == MESA_FORMAT_S_UINT8) {
> > /* Note: we can't compute the stencil offset using
> > @@ -183,12 +183,12 @@ blorp_surf_for_miptree(struct brw_context *brw,
> > };
> >
> > if (brw->gen == 6 && mt->format == MESA_FORMAT_S_UINT8 &&
> > - mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
> > - /* Sandy bridge stencil and HiZ use this ALL_SLICES_AT_EACH_LOD
> hack in
> > + mt->array_layout == GEN6_HIZ_STENCIL) {
> > + /* Sandy bridge stencil and HiZ use this GEN6_HIZ_STENCIL hack in
> > * order to allow for layered rendering. The hack makes each LOD
> of the
> > * stencil or HiZ buffer a single tightly packed array surface at
> some
> > * offset into the surface. Since ISL doesn't know how to deal
> with the
> > - * crazy ALL_SLICES_AT_EACH_LOD layout and since we have to do a
> manual
> > + * crazy GEN6_HIZ_STENCIL layout and since we have to do a manual
> > * offset of it anyway, we might as well do the offset here and
> keep the
> > * hacks inside the i965 driver.
> > *
> > @@ -241,8 +241,7 @@ blorp_surf_for_miptree(struct brw_context *brw,
> >
> > struct intel_mipmap_tree *hiz_mt = mt->hiz_buf->mt;
> > if (hiz_mt) {
> > - assert(brw->gen == 6 &&
> > - hiz_mt->array_layout == ALL_SLICES_AT_EACH_LOD);
> > + assert(brw->gen == 6 && hiz_mt->array_layout ==
> GEN6_HIZ_STENCIL);
> >
> > /* gen6 requires the HiZ buffer to be manually offset to the
> > * right location. We could fixup the surf but it doesn't
> > diff --git a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> > index bfa8afa..30e6233 100644
> > --- a/src/mesa/drivers/dri/i965/brw_tex_layout.c
> > +++ b/src/mesa/drivers/dri/i965/brw_tex_layout.c
> > @@ -216,6 +216,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
> > mt->total_height = MAX2(mt->total_height, y + img_height);
> >
> > /* Layout_below: step right after second mipmap.
> > + *
> > + * For Sandy Bridge HiZ and stencil, we always step down.
> > */
> > if (level == mt->first_level + 1) {
> > x += ALIGN_NPOT(width, mt->halign) / bw;
> > @@ -231,6 +233,67 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)
> > }
> > }
> >
> > +static void
> > +brw_miptree_layout_gen6_hiz_stencil(struct intel_mipmap_tree *mt)
> > +{
> > + unsigned x = 0;
> > + unsigned y = 0;
> > + unsigned width = mt->physical_width0;
> > + unsigned height = mt->physical_height0;
> > + /* Number of layers of array texture. */
> > + unsigned depth = mt->physical_depth0;
>
> We could mark height and depth const, only width is variant. I was also
> thinking if we want to comment anything about needing to use also level
> zero
> depth for all subsequent levels in case of 3D textures (instead of
> minifying
> depth).
>
> > + unsigned tile_width, tile_height, bw, bh;
> > +
> > + if (mt->format == MESA_FORMAT_S_UINT8) {
> > + bw = bh = 1;
> > + /* W-tiled */
> > + tile_width = 64;
> > + tile_height = 64;
> > + } else {
> > + assert(_mesa_get_format_base_format(mt->format) ==
> GL_DEPTH_COMPONENT ||
> > + _mesa_get_format_base_format(mt->format) ==
> GL_DEPTH_STENCIL);
> > + /* Each 128-bit HiZ block corresponds to a region of of 8x4 depth
> > + * samples. Each cache line in the Y-Tiled HiZ image contains
> 2x2 HiZ
> > + * blocks. Therefore, each Y-tiled cache line corresponds to an
> 16x8
> > + * region in the depth surface. Since we're representing it as
> > + * RGBA_FLOAT32, the miptree calculations will think that each
> cache
> > + * line is 1x4 pixels. Therefore, we need a scale-down factor of
> 16x2
> > + * and a vertical alignment of 2.
> > + */
> > + mt->cpp = 16;
> > + bw = 16;
> > + bh = 2;
> > + /* Y-tiled */
> > + tile_width = 128 / mt->cpp;
> > + tile_height = 32;
> > + }
> > +
> > + mt->total_width = 0;
> > + mt->total_height = 0;
>
> We could pull height here as it is invariant in the loop:
>
> const unsigned img_height =
> ALIGN(DIV_ROUND_UP(height, bh), mt->valign) * depth;
>
Sure. I considered doing that. In the end I thought it was nice to keep
it next to img_width. The compiler will pull it out of the loop for us
anyway. If you want it moved, I don't mind.
> > +
> > + for (unsigned level = mt->first_level; level <= mt->last_level;
> level++) {
> > + intel_miptree_set_level_info(mt, level, x, y, depth);
> > +
> > + const unsigned img_width = ALIGN(DIV_ROUND_UP(width, bw),
> mt->halign);
> > +
> > + mt->total_width = MAX2(mt->total_width, x + img_width);
> > + mt->total_height = MAX2(mt->total_height, y + img_height);
> > +
> > + if (level == mt->first_level) {
> > + y += ALIGN(img_height, tile_height);
> > + } else {
> > + x += ALIGN(img_width, tile_width);
> > + }
> > +
> > + /* We only minify the width. We want qpitch to match for all
> miplevels
> > + * because the hardware doesn't know we aren't on LOD0.
> > + */
> > + width = minify(width, 1);
> > + }
> > +}
> > +
> > unsigned
> > brw_miptree_get_horizontal_slice_pitch(const struct brw_context *brw,
> > const struct intel_mipmap_tree
> *mt,
> > @@ -249,6 +312,8 @@ brw_miptree_get_vertical_slice_pitch(const struct
> brw_context *brw,
> > const struct intel_mipmap_tree *mt,
> > unsigned level)
> > {
> > + assert(mt->array_layout != GEN6_HIZ_STENCIL || brw->gen == 6);
> > +
> > if (brw->gen >= 9) {
> > /* ALL_SLICES_AT_EACH_LOD isn't supported on Gen8+ but this code
> will
> > * effectively end up with a packed qpitch anyway whenever
> > @@ -281,6 +346,15 @@ brw_miptree_get_vertical_slice_pitch(const struct
> brw_context *brw,
> > mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
> > return ALIGN_NPOT(minify(mt->physical_height0, level),
> mt->valign);
> >
> > + } else if (mt->array_layout == GEN6_HIZ_STENCIL) {
> > + /* For HiZ and stencil on Sandy Bridge, we don't minify the
> height. */
> > + if (mt->format == MESA_FORMAT_S_UINT8) {
> > + return ALIGN(mt->physical_height0, mt->valign);
> > + } else {
> > + /* HiZ has a vertical scale factor of 2. */
> > + return ALIGN(DIV_ROUND_UP(mt->physical_height0, 2),
> mt->halign);
> > + }
> > +
> > } else {
> > const unsigned h0 = ALIGN_NPOT(mt->physical_height0, mt->valign);
> > const unsigned h1 = ALIGN_NPOT(minify(mt->physical_height0, 1),
> mt->valign);
> > @@ -333,6 +407,8 @@ brw_miptree_layout_texture_array(struct brw_context
> *brw,
> >
> > if (layout_1d)
> > gen9_miptree_layout_1d(mt);
> > + else if (mt->array_layout == GEN6_HIZ_STENCIL)
> > + brw_miptree_layout_gen6_hiz_stencil(mt);
> > else
> > brw_miptree_layout_2d(mt);
> >
> > @@ -556,6 +632,8 @@ intel_miptree_set_total_width_height(struct
> brw_context *brw,
> > case INTEL_MSAA_LAYOUT_IMS:
> > if (gen9_use_linear_1d_layout(brw, mt))
> > gen9_miptree_layout_1d(mt);
> > + else if (mt->array_layout == GEN6_HIZ_STENCIL)
> > + brw_miptree_layout_gen6_hiz_stencil(mt);
> > else
> > brw_miptree_layout_2d(mt);
> > break;
> > @@ -579,15 +657,9 @@ intel_miptree_set_alignment(struct brw_context
> *brw,
> > * - Ironlake and Sandybridge PRMs: Volume 1, Part 1, Section
> 7.18.3.4
> > * - BSpec (for Ivybridge and slight variations in separate stencil)
> > */
> > - bool gen6_hiz_or_stencil = false;
> >
> > - if (brw->gen == 6 && mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
> > - const GLenum base_format = _mesa_get_format_base_format(
> mt->format);
> > - gen6_hiz_or_stencil = _mesa_is_depth_or_stencil_
> format(base_format);
> > - }
> > -
> > - if (gen6_hiz_or_stencil) {
> > - /* On gen6, we use ALL_SLICES_AT_EACH_LOD for stencil/hiz because
> the
> > + if (mt->array_layout == GEN6_HIZ_STENCIL) {
> > + /* On gen6, we use GEN6_HIZ_STENCIL for stencil/hiz because the
> > * hardware doesn't support multiple mip levels on stencil/hiz.
> > *
> > * PRM Vol 2, Part 1, 7.5.3 Hierarchical Depth Buffer:
> > @@ -600,15 +672,13 @@ intel_miptree_set_alignment(struct brw_context
> *brw,
> > /* Stencil uses W tiling, so we force W tiling alignment for
> the
> > * ALL_SLICES_AT_EACH_LOD miptree layout.
> > */
> > - mt->halign = 64;
> > - mt->valign = 64;
> > + mt->halign = 4;
> > + mt->valign = 2;
> > assert((layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0);
> > } else {
> > - /* Depth uses Y tiling, so we force need Y tiling alignment
> for the
> > - * ALL_SLICES_AT_EACH_LOD miptree layout.
> > - */
> > - mt->halign = 128 / mt->cpp;
> > - mt->valign = 32;
> > + /* See intel_hiz_miptree_buf_create() */
>
> There isnt't really anything about halign/valign in
> intel_hiz_miptree_buf_create()? Should we refer to
> brw_miptree_layout_gen6_hiz_stencil() instead?
>
Yes, that is a stale comment. I'll update it.
> > + mt->halign = 1;
> > + mt->valign = 2;
> > }
> > } else if (mt->compressed) {
> > /* The hardware alignment requirements for compressed textures
> > diff --git a/src/mesa/drivers/dri/i965/gen6_depth_state.c
> b/src/mesa/drivers/dri/i965/gen6_depth_state.c
> > index ae4f681..20992d5 100644
> > --- a/src/mesa/drivers/dri/i965/gen6_depth_state.c
> > +++ b/src/mesa/drivers/dri/i965/gen6_depth_state.c
> > @@ -164,7 +164,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw,
> > struct intel_mipmap_tree *hiz_mt = depth_mt->hiz_buf->mt;
> > uint32_t offset = 0;
> >
> > - if (hiz_mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
> > + if (hiz_mt->array_layout == GEN6_HIZ_STENCIL) {
> > offset = intel_miptree_get_aligned_offset(
> > hiz_mt,
> > hiz_mt->level[lod].level_x,
> > @@ -190,7 +190,7 @@ gen6_emit_depth_stencil_hiz(struct brw_context *brw,
> > if (separate_stencil) {
> > uint32_t offset = 0;
> >
> > - if (stencil_mt->array_layout == ALL_SLICES_AT_EACH_LOD) {
> > + if (stencil_mt->array_layout == GEN6_HIZ_STENCIL) {
> > assert(stencil_mt->format == MESA_FORMAT_S_UINT8);
> >
> > /* Note: we can't compute the stencil offset using
> > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > index e334951..9f9e68a 100644
> > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> > @@ -452,7 +452,7 @@ intel_miptree_create_layout(struct brw_context *brw,
> > intel_miptree_wants_hiz_buffer(brw, mt)))) {
> > uint32_t stencil_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD;
> > if (brw->gen == 6) {
> > - stencil_flags |= MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD |
> > + stencil_flags |= MIPTREE_LAYOUT_GEN6_HIZ_STENCIL |
> > MIPTREE_LAYOUT_TILING_ANY;
> > }
> >
> > @@ -485,8 +485,8 @@ intel_miptree_create_layout(struct brw_context *brw,
> > }
> > }
> >
> > - if (layout_flags & MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD)
> > - mt->array_layout = ALL_SLICES_AT_EACH_LOD;
> > + if (layout_flags & MIPTREE_LAYOUT_GEN6_HIZ_STENCIL)
> > + mt->array_layout = GEN6_HIZ_STENCIL;
> >
> > /*
> > * Obey HALIGN_16 constraints for Gen8 and Gen9 buffers which are
> > @@ -1830,7 +1830,7 @@ intel_hiz_miptree_buf_create(struct brw_context
> *brw,
> > uint32_t layout_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD;
> >
> > if (brw->gen == 6)
> > - layout_flags |= MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD;
> > + layout_flags |= MIPTREE_LAYOUT_GEN6_HIZ_STENCIL;
> >
> > if (!buf)
> > return NULL;
> > @@ -2770,7 +2770,7 @@ intel_update_r8stencil(struct brw_context *brw,
> > const uint32_t r8stencil_flags =
> > MIPTREE_LAYOUT_ACCELERATED_UPLOAD | MIPTREE_LAYOUT_TILING_Y |
> > MIPTREE_LAYOUT_DISABLE_AUX;
> > - assert(brw->gen > 6); /* Handle MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD
> */
> > + assert(brw->gen > 6); /* Handle MIPTREE_LAYOUT_GEN6_HIZ_STENCIL
> */
> > mt->r8stencil_mt = intel_miptree_create(brw,
> > src->target,
> > MESA_FORMAT_R_UINT8,
> > @@ -3670,6 +3670,7 @@ intel_miptree_get_isl_surf(struct brw_context
> *brw,
> > surf->array_pitch_span = ISL_ARRAY_PITCH_SPAN_FULL;
> > break;
> > case ALL_SLICES_AT_EACH_LOD:
> > + case GEN6_HIZ_STENCIL:
> > surf->array_pitch_span = ISL_ARRAY_PITCH_SPAN_COMPACT;
> > break;
> > default:
> > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> > index aa52f48..e2ec26f 100644
> > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
> > @@ -250,6 +250,41 @@ enum miptree_array_layout {
> > * +---+
> > */
> > ALL_SLICES_AT_EACH_LOD,
> > +
> > + /* On Sandy Bridge, HiZ and stencil buffers work the same as on Ivy
> Bridge
> > + * except that they don't technically support mipmapping. That does
> not,
> > + * however, stop us from doing it. As far as Sandy Bridge hardware
> is
> > + * concerned, HiZ and stencil always operates on a single miplevel 2D
> > + * (possibly array) image. The dimensions of that image are NOT
> minified.
> > + *
> > + * In order to implement HiZ and stencil on Sandy Bridge, we create
> one
> > + * full-sized 2D (possibly array) image for every LOD with every
> image
> > + * aligned to a page boundary. In order to save memory, we pretend
> that
> > + * the width of each miplevel is minified and we place LOD1 and
> above below
> > + * LOD0 but horizontally adjacent to each other. When considered as
> > + * full-sized images, LOD1 and above technically overlap. However,
> since
> > + * we only write to part of that image, the hardware will never
> notice the
> > + * overlap.
> > + *
> > + * This layout looks something like this:
> > + *
> > + * +---------+
> > + * | |
> > + * | |
> > + * +---------+
> > + * | |
> > + * | |
> > + * +---------+
> > + *
> > + * +----+ +-+ .
> > + * | | +-+
> > + * +----+
> > + *
> > + * +----+ +-+ .
> > + * | | +-+
> > + * +----+
>
> If we want to reflect all levels having the same heigth, maybe:
>
What matters isn't that they all have the same height but that they all
have the same qpitch. I don't know if there was a better way to denote
that.
> +---------+
> | |
> | |
> +---------+
> | |
> | |
> +---------+
>
> +----+ +-+ .
> | | | |
> | | | |
> +----+ +-+
>
> +----+ +-+ .
> | | | |
> | | | |
> +----+ +-+
>
> > + */
> > + GEN6_HIZ_STENCIL,
> > };
> >
> > enum intel_aux_disable {
> > @@ -637,7 +672,7 @@ intel_miptree_alloc_non_msrt_mcs(struct brw_context
> *brw,
> >
> > enum {
> > MIPTREE_LAYOUT_ACCELERATED_UPLOAD = 1 << 0,
> > - MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD = 1 << 1,
> > + MIPTREE_LAYOUT_GEN6_HIZ_STENCIL = 1 << 1,
> > MIPTREE_LAYOUT_FOR_BO = 1 << 2,
> > MIPTREE_LAYOUT_DISABLE_AUX = 1 << 3,
> > MIPTREE_LAYOUT_FORCE_HALIGN16 = 1 << 4,
> > --
> > 2.5.0.400.gff86faf
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170530/f4c4182e/attachment-0001.html>
More information about the mesa-dev
mailing list