<div dir="ltr">+chad<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 29, 2017 at 12:09 PM, Jason Ekstrand <span dir="ltr"><<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Sandy Bridge does not technically support mipmapped depth/stencil. In<br>
order to work around this, we allocate what are effectively completely<br>
separate images for each miplevel, ensure that they are page-aligned,<br>
and manually offset to them. Prior to layered rendering, this was a<br>
simple matter of setting a large enough halign/valign.<br>
<br>
With the advent of layered rendering, however, things got more<br>
complicated. Now, things weren't as simple as just handing a surface<br>
off to the hardware. Any miplevel of a normally mipmapped surface can<br>
be considered as just an array surface given the right qpitch. However,<br>
the hardware gives us no capability to specify qpitch so this won't<br>
work. Instead, the chosen solution was to use a new "all slices at each<br>
LOD" layout which laid things out as a mipmap of arrays rather than an<br>
array of mipmaps. This way you can easily offset to any of the<br>
miplevels and each is a valid array.<br>
<br>
Unfortunately, the "all slices at each lod" concept missed one<br>
fundamental thing about SNB HiZ and stencil hardware: It doesn't just<br>
always act as if you're always working with a non-mipmapped surface, it<br>
acts as if you're always working on a non-mipmapped surface of the same<br>
size as LOD0. In other words, even though it may only write the<br>
upper-left corner of each array slice, the qpitch for the array is for a<br>
surface the size of LOD0 of the depth surface. This mistake causes us<br>
to under-allocate HiZ and stencil in some cases and also to accidentally<br>
allow different miplevels to overlap. Sadly, piglit test coverage<br>
didn't quite catch this until I started making changes to the resolve<br>
code that caused additional HiZ resolves in certain tests.<br>
<br>
This commit switches Sandy Bridge HiZ and stencil over to a new scheme<br>
that lays out the non-zero miplevels horizontally below LOD0. This way<br>
they can all have the same qpitch without interfering with each other.<br>
Technically, the miplevels still overlap, but things are spaced out<br>
enough that each page is only in the "written area" of one LOD. Hopefully,<br>
this will get rid of at least some of the random SNB hangs.<br>
<br>
Cc: "17.0 17.1" <<a href="mailto:mesa-stable@lists.freedesktop.org">mesa-stable@lists.<wbr>freedesktop.org</a>><br>
Cc: Topi Pohjolainen <<a href="mailto:topi.pohjolainen@intel.com">topi.pohjolainen@intel.com</a>><br>
Cc: Nanley Chery <<a href="mailto:nanley.g.chery@intel.com">nanley.g.chery@intel.com</a>><br>
Cc: Jordan Justen <<a href="mailto:jordan.l.justen@intel.com">jordan.l.justen@intel.com</a>><br>
Cc: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>><br>
<br>
---<br>
The series I sent out on Friday suffered from a GPU hang or two on Sandy<br>
Bridge. It turns out that those hangs were caused by the hardware HiZ<br>
resolving part of my batch buffer due to this under-allocation.<br>
<br>
Topi, I'm sorry but this will likely make hash of your earlier patch<br>
series. Sadly, I don't think there's really anything else we can do. :-(<br>
Also, given how tricky this is to get right, I concede that we may want to<br>
add an ISL_DIM_LAYOUT_GEN6_HIZ_<wbr>STENCIL layout to ISL. We could still do it<br>
with the "array of surfaces" approach but I think these sorts of<br>
calculations are best done in with the other surface calculation code. I<br>
can help draft it if you'd like.<br>
<br>
src/mesa/drivers/dri/i965/brw_<wbr>blorp.c | 11 ++-<br>
src/mesa/drivers/dri/i965/brw_<wbr>tex_layout.c | 100 ++++++++++++++++++++++----<br>
src/mesa/drivers/dri/i965/<wbr>gen6_depth_state.c | 4 +-<br>
src/mesa/drivers/dri/i965/<wbr>intel_mipmap_tree.c | 11 +--<br>
src/mesa/drivers/dri/i965/<wbr>intel_mipmap_tree.h | 37 +++++++++-<br>
5 files changed, 134 insertions(+), 29 deletions(-)<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_blorp.c b/src/mesa/drivers/dri/i965/<wbr>brw_blorp.c<br>
index 6e860f0..cb9933b 100644<br>
--- a/src/mesa/drivers/dri/i965/<wbr>brw_blorp.c<br>
+++ b/src/mesa/drivers/dri/i965/<wbr>brw_blorp.c<br>
@@ -123,7 +123,7 @@ apply_gen6_stencil_hiz_offset(<wbr>struct isl_surf *surf,<br>
uint32_t lod,<br>
uint32_t *offset)<br>
{<br>
- assert(mt->array_layout == ALL_SLICES_AT_EACH_LOD);<br>
+ assert(mt->array_layout == GEN6_HIZ_STENCIL);<br>
<br>
if (mt->format == MESA_FORMAT_S_UINT8) {<br>
/* Note: we can't compute the stencil offset using<br>
@@ -183,12 +183,12 @@ blorp_surf_for_miptree(struct brw_context *brw,<br>
};<br>
<br>
if (brw->gen == 6 && mt->format == MESA_FORMAT_S_UINT8 &&<br>
- mt->array_layout == ALL_SLICES_AT_EACH_LOD) {<br>
- /* Sandy bridge stencil and HiZ use this ALL_SLICES_AT_EACH_LOD hack in<br>
+ mt->array_layout == GEN6_HIZ_STENCIL) {<br>
+ /* Sandy bridge stencil and HiZ use this GEN6_HIZ_STENCIL hack in<br>
* order to allow for layered rendering. The hack makes each LOD of the<br>
* stencil or HiZ buffer a single tightly packed array surface at some<br>
* offset into the surface. Since ISL doesn't know how to deal with the<br>
- * crazy ALL_SLICES_AT_EACH_LOD layout and since we have to do a manual<br>
+ * crazy GEN6_HIZ_STENCIL layout and since we have to do a manual<br>
* offset of it anyway, we might as well do the offset here and keep the<br>
* hacks inside the i965 driver.<br>
*<br>
@@ -241,8 +241,7 @@ blorp_surf_for_miptree(struct brw_context *brw,<br>
<br>
struct intel_mipmap_tree *hiz_mt = mt->hiz_buf->mt;<br>
if (hiz_mt) {<br>
- assert(brw->gen == 6 &&<br>
- hiz_mt->array_layout == ALL_SLICES_AT_EACH_LOD);<br>
+ assert(brw->gen == 6 && hiz_mt->array_layout == GEN6_HIZ_STENCIL);<br>
<br>
/* gen6 requires the HiZ buffer to be manually offset to the<br>
* right location. We could fixup the surf but it doesn't<br>
diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_tex_layout.c b/src/mesa/drivers/dri/i965/<wbr>brw_tex_layout.c<br>
index bfa8afa..30e6233 100644<br>
--- a/src/mesa/drivers/dri/i965/<wbr>brw_tex_layout.c<br>
+++ b/src/mesa/drivers/dri/i965/<wbr>brw_tex_layout.c<br>
@@ -216,6 +216,8 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)<br>
mt->total_height = MAX2(mt->total_height, y + img_height);<br>
<br>
/* Layout_below: step right after second mipmap.<br>
+ *<br>
+ * For Sandy Bridge HiZ and stencil, we always step down.<br>
*/<br>
if (level == mt->first_level + 1) {<br>
x += ALIGN_NPOT(width, mt->halign) / bw;<br>
@@ -231,6 +233,67 @@ brw_miptree_layout_2d(struct intel_mipmap_tree *mt)<br>
}<br>
}<br>
<br>
+static void<br>
+brw_miptree_layout_gen6_hiz_<wbr>stencil(struct intel_mipmap_tree *mt)<br>
+{<br>
+ unsigned x = 0;<br>
+ unsigned y = 0;<br>
+ unsigned width = mt->physical_width0;<br>
+ unsigned height = mt->physical_height0;<br>
+ /* Number of layers of array texture. */<br>
+ unsigned depth = mt->physical_depth0;<br>
+ unsigned tile_width, tile_height, bw, bh;<br>
+<br>
+ if (mt->format == MESA_FORMAT_S_UINT8) {<br>
+ bw = bh = 1;<br>
+ /* W-tiled */<br>
+ tile_width = 64;<br>
+ tile_height = 64;<br>
+ } else {<br>
+ assert(_mesa_get_format_base_<wbr>format(mt->format) == GL_DEPTH_COMPONENT ||<br>
+ _mesa_get_format_base_format(<wbr>mt->format) == GL_DEPTH_STENCIL);<br>
+ /* Each 128-bit HiZ block corresponds to a region of of 8x4 depth<br>
+ * samples. Each cache line in the Y-Tiled HiZ image contains 2x2 HiZ<br>
+ * blocks. Therefore, each Y-tiled cache line corresponds to an 16x8<br>
+ * region in the depth surface. Since we're representing it as<br>
+ * RGBA_FLOAT32, the miptree calculations will think that each cache<br>
+ * line is 1x4 pixels. Therefore, we need a scale-down factor of 16x2<br>
+ * and a vertical alignment of 2.<br>
+ */<br>
+ mt->cpp = 16;<br>
+ bw = 16;<br>
+ bh = 2;<br>
+ /* Y-tiled */<br>
+ tile_width = 128 / mt->cpp;<br>
+ tile_height = 32;<br>
+ }<br>
+<br>
+ mt->total_width = 0;<br>
+ mt->total_height = 0;<br>
+<br>
+ for (unsigned level = mt->first_level; level <= mt->last_level; level++) {<br>
+ intel_miptree_set_level_info(<wbr>mt, level, x, y, depth);<br>
+<br>
+ const unsigned img_width = ALIGN(DIV_ROUND_UP(width, bw), mt->halign);<br>
+ const unsigned img_height =<br>
+ ALIGN(DIV_ROUND_UP(height, bh), mt->valign) * depth;<br>
+<br>
+ mt->total_width = MAX2(mt->total_width, x + img_width);<br>
+ mt->total_height = MAX2(mt->total_height, y + img_height);<br>
+<br>
+ if (level == mt->first_level) {<br>
+ y += ALIGN(img_height, tile_height);<br>
+ } else {<br>
+ x += ALIGN(img_width, tile_width);<br>
+ }<br>
+<br>
+ /* We only minify the width. We want qpitch to match for all miplevels<br>
+ * because the hardware doesn't know we aren't on LOD0.<br>
+ */<br>
+ width = minify(width, 1);<br>
+ }<br>
+}<br>
+<br>
unsigned<br>
brw_miptree_get_horizontal_<wbr>slice_pitch(const struct brw_context *brw,<br>
const struct intel_mipmap_tree *mt,<br>
@@ -249,6 +312,8 @@ brw_miptree_get_vertical_<wbr>slice_pitch(const struct brw_context *brw,<br>
const struct intel_mipmap_tree *mt,<br>
unsigned level)<br>
{<br>
+ assert(mt->array_layout != GEN6_HIZ_STENCIL || brw->gen == 6);<br>
+<br>
if (brw->gen >= 9) {<br>
/* ALL_SLICES_AT_EACH_LOD isn't supported on Gen8+ but this code will<br>
* effectively end up with a packed qpitch anyway whenever<br>
@@ -281,6 +346,15 @@ brw_miptree_get_vertical_<wbr>slice_pitch(const struct brw_context *brw,<br>
mt->array_layout == ALL_SLICES_AT_EACH_LOD) {<br>
return ALIGN_NPOT(minify(mt-><wbr>physical_height0, level), mt->valign);<br>
<br>
+ } else if (mt->array_layout == GEN6_HIZ_STENCIL) {<br>
+ /* For HiZ and stencil on Sandy Bridge, we don't minify the height. */<br>
+ if (mt->format == MESA_FORMAT_S_UINT8) {<br>
+ return ALIGN(mt->physical_height0, mt->valign);<br>
+ } else {<br>
+ /* HiZ has a vertical scale factor of 2. */<br>
+ return ALIGN(DIV_ROUND_UP(mt-><wbr>physical_height0, 2), mt->halign);<br>
+ }<br>
+<br>
} else {<br>
const unsigned h0 = ALIGN_NPOT(mt->physical_<wbr>height0, mt->valign);<br>
const unsigned h1 = ALIGN_NPOT(minify(mt-><wbr>physical_height0, 1), mt->valign);<br>
@@ -333,6 +407,8 @@ brw_miptree_layout_texture_<wbr>array(struct brw_context *brw,<br>
<br>
if (layout_1d)<br>
gen9_miptree_layout_1d(mt);<br>
+ else if (mt->array_layout == GEN6_HIZ_STENCIL)<br>
+ brw_miptree_layout_gen6_hiz_<wbr>stencil(mt);<br>
else<br>
brw_miptree_layout_2d(mt);<br>
<br>
@@ -556,6 +632,8 @@ intel_miptree_set_total_width_<wbr>height(struct brw_context *brw,<br>
case INTEL_MSAA_LAYOUT_IMS:<br>
if (gen9_use_linear_1d_layout(<wbr>brw, mt))<br>
gen9_miptree_layout_1d(mt);<br>
+ else if (mt->array_layout == GEN6_HIZ_STENCIL)<br>
+ brw_miptree_layout_gen6_hiz_<wbr>stencil(mt);<br>
else<br>
brw_miptree_layout_2d(mt);<br>
break;<br>
@@ -579,15 +657,9 @@ intel_miptree_set_alignment(<wbr>struct brw_context *brw,<br>
* - Ironlake and Sandybridge PRMs: Volume 1, Part 1, Section 7.18.3.4<br>
* - BSpec (for Ivybridge and slight variations in separate stencil)<br>
*/<br>
- bool gen6_hiz_or_stencil = false;<br>
<br>
- if (brw->gen == 6 && mt->array_layout == ALL_SLICES_AT_EACH_LOD) {<br>
- const GLenum base_format = _mesa_get_format_base_format(<wbr>mt->format);<br>
- gen6_hiz_or_stencil = _mesa_is_depth_or_stencil_<wbr>format(base_format);<br>
- }<br>
-<br>
- if (gen6_hiz_or_stencil) {<br>
- /* On gen6, we use ALL_SLICES_AT_EACH_LOD for stencil/hiz because the<br>
+ if (mt->array_layout == GEN6_HIZ_STENCIL) {<br>
+ /* On gen6, we use GEN6_HIZ_STENCIL for stencil/hiz because the<br>
* hardware doesn't support multiple mip levels on stencil/hiz.<br>
*<br>
* PRM Vol 2, Part 1, 7.5.3 Hierarchical Depth Buffer:<br>
@@ -600,15 +672,13 @@ intel_miptree_set_alignment(<wbr>struct brw_context *brw,<br>
/* Stencil uses W tiling, so we force W tiling alignment for the<br>
* ALL_SLICES_AT_EACH_LOD miptree layout.<br>
*/<br>
- mt->halign = 64;<br>
- mt->valign = 64;<br>
+ mt->halign = 4;<br>
+ mt->valign = 2;<br>
assert((layout_flags & MIPTREE_LAYOUT_FORCE_HALIGN16) == 0);<br>
} else {<br>
- /* Depth uses Y tiling, so we force need Y tiling alignment for the<br>
- * ALL_SLICES_AT_EACH_LOD miptree layout.<br>
- */<br>
- mt->halign = 128 / mt->cpp;<br>
- mt->valign = 32;<br>
+ /* See intel_hiz_miptree_buf_create() */<br>
+ mt->halign = 1;<br>
+ mt->valign = 2;<br>
}<br>
} else if (mt->compressed) {<br>
/* The hardware alignment requirements for compressed textures<br>
diff --git a/src/mesa/drivers/dri/i965/<wbr>gen6_depth_state.c b/src/mesa/drivers/dri/i965/<wbr>gen6_depth_state.c<br>
index ae4f681..20992d5 100644<br>
--- a/src/mesa/drivers/dri/i965/<wbr>gen6_depth_state.c<br>
+++ b/src/mesa/drivers/dri/i965/<wbr>gen6_depth_state.c<br>
@@ -164,7 +164,7 @@ gen6_emit_depth_stencil_hiz(<wbr>struct brw_context *brw,<br>
struct intel_mipmap_tree *hiz_mt = depth_mt->hiz_buf->mt;<br>
uint32_t offset = 0;<br>
<br>
- if (hiz_mt->array_layout == ALL_SLICES_AT_EACH_LOD) {<br>
+ if (hiz_mt->array_layout == GEN6_HIZ_STENCIL) {<br>
offset = intel_miptree_get_aligned_<wbr>offset(<br>
hiz_mt,<br>
hiz_mt->level[lod].level_x,<br>
@@ -190,7 +190,7 @@ gen6_emit_depth_stencil_hiz(<wbr>struct brw_context *brw,<br>
if (separate_stencil) {<br>
uint32_t offset = 0;<br>
<br>
- if (stencil_mt->array_layout == ALL_SLICES_AT_EACH_LOD) {<br>
+ if (stencil_mt->array_layout == GEN6_HIZ_STENCIL) {<br>
assert(stencil_mt->format == MESA_FORMAT_S_UINT8);<br>
<br>
/* Note: we can't compute the stencil offset using<br>
diff --git a/src/mesa/drivers/dri/i965/<wbr>intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/<wbr>intel_mipmap_tree.c<br>
index e334951..9f9e68a 100644<br>
--- a/src/mesa/drivers/dri/i965/<wbr>intel_mipmap_tree.c<br>
+++ b/src/mesa/drivers/dri/i965/<wbr>intel_mipmap_tree.c<br>
@@ -452,7 +452,7 @@ intel_miptree_create_layout(<wbr>struct brw_context *brw,<br>
intel_miptree_wants_hiz_<wbr>buffer(brw, mt)))) {<br>
uint32_t stencil_flags = MIPTREE_LAYOUT_ACCELERATED_<wbr>UPLOAD;<br>
if (brw->gen == 6) {<br>
- stencil_flags |= MIPTREE_LAYOUT_FORCE_ALL_<wbr>SLICE_AT_LOD |<br>
+ stencil_flags |= MIPTREE_LAYOUT_GEN6_HIZ_<wbr>STENCIL |<br>
MIPTREE_LAYOUT_TILING_ANY;<br>
}<br>
<br>
@@ -485,8 +485,8 @@ intel_miptree_create_layout(<wbr>struct brw_context *brw,<br>
}<br>
}<br>
<br>
- if (layout_flags & MIPTREE_LAYOUT_FORCE_ALL_<wbr>SLICE_AT_LOD)<br>
- mt->array_layout = ALL_SLICES_AT_EACH_LOD;<br>
+ if (layout_flags & MIPTREE_LAYOUT_GEN6_HIZ_<wbr>STENCIL)<br>
+ mt->array_layout = GEN6_HIZ_STENCIL;<br>
<br>
/*<br>
* Obey HALIGN_16 constraints for Gen8 and Gen9 buffers which are<br>
@@ -1830,7 +1830,7 @@ intel_hiz_miptree_buf_create(<wbr>struct brw_context *brw,<br>
uint32_t layout_flags = MIPTREE_LAYOUT_ACCELERATED_<wbr>UPLOAD;<br>
<br>
if (brw->gen == 6)<br>
- layout_flags |= MIPTREE_LAYOUT_FORCE_ALL_<wbr>SLICE_AT_LOD;<br>
+ layout_flags |= MIPTREE_LAYOUT_GEN6_HIZ_<wbr>STENCIL;<br>
<br>
if (!buf)<br>
return NULL;<br>
@@ -2770,7 +2770,7 @@ intel_update_r8stencil(struct brw_context *brw,<br>
const uint32_t r8stencil_flags =<br>
MIPTREE_LAYOUT_ACCELERATED_<wbr>UPLOAD | MIPTREE_LAYOUT_TILING_Y |<br>
MIPTREE_LAYOUT_DISABLE_AUX;<br>
- assert(brw->gen > 6); /* Handle MIPTREE_LAYOUT_FORCE_ALL_<wbr>SLICE_AT_LOD */<br>
+ assert(brw->gen > 6); /* Handle MIPTREE_LAYOUT_GEN6_HIZ_<wbr>STENCIL */<br>
mt->r8stencil_mt = intel_miptree_create(brw,<br>
src->target,<br>
MESA_FORMAT_R_UINT8,<br>
@@ -3670,6 +3670,7 @@ intel_miptree_get_isl_surf(<wbr>struct brw_context *brw,<br>
surf->array_pitch_span = ISL_ARRAY_PITCH_SPAN_FULL;<br>
break;<br>
case ALL_SLICES_AT_EACH_LOD:<br>
+ case GEN6_HIZ_STENCIL:<br>
surf->array_pitch_span = ISL_ARRAY_PITCH_SPAN_COMPACT;<br>
break;<br>
default:<br>
diff --git a/src/mesa/drivers/dri/i965/<wbr>intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/<wbr>intel_mipmap_tree.h<br>
index aa52f48..e2ec26f 100644<br>
--- a/src/mesa/drivers/dri/i965/<wbr>intel_mipmap_tree.h<br>
+++ b/src/mesa/drivers/dri/i965/<wbr>intel_mipmap_tree.h<br>
@@ -250,6 +250,41 @@ enum miptree_array_layout {<br>
* +---+<br>
*/<br>
ALL_SLICES_AT_EACH_LOD,<br>
+<br>
+ /* On Sandy Bridge, HiZ and stencil buffers work the same as on Ivy Bridge<br>
+ * except that they don't technically support mipmapping. That does not,<br>
+ * however, stop us from doing it. As far as Sandy Bridge hardware is<br>
+ * concerned, HiZ and stencil always operates on a single miplevel 2D<br>
+ * (possibly array) image. The dimensions of that image are NOT minified.<br>
+ *<br>
+ * In order to implement HiZ and stencil on Sandy Bridge, we create one<br>
+ * full-sized 2D (possibly array) image for every LOD with every image<br>
+ * aligned to a page boundary. In order to save memory, we pretend that<br>
+ * the width of each miplevel is minified and we place LOD1 and above below<br>
+ * LOD0 but horizontally adjacent to each other. When considered as<br>
+ * full-sized images, LOD1 and above technically overlap. However, since<br>
+ * we only write to part of that image, the hardware will never notice the<br>
+ * overlap.<br>
+ *<br>
+ * This layout looks something like this:<br>
+ *<br>
+ * +---------+<br>
+ * | |<br>
+ * | |<br>
+ * +---------+<br>
+ * | |<br>
+ * | |<br>
+ * +---------+<br>
+ *<br>
+ * +----+ +-+ .<br>
+ * | | +-+<br>
+ * +----+<br>
+ *<br>
+ * +----+ +-+ .<br>
+ * | | +-+<br>
+ * +----+<br>
+ */<br>
+ GEN6_HIZ_STENCIL,<br>
};<br>
<br>
enum intel_aux_disable {<br>
@@ -637,7 +672,7 @@ intel_miptree_alloc_non_msrt_<wbr>mcs(struct brw_context *brw,<br>
<br>
enum {<br>
MIPTREE_LAYOUT_ACCELERATED_<wbr>UPLOAD = 1 << 0,<br>
- MIPTREE_LAYOUT_FORCE_ALL_<wbr>SLICE_AT_LOD = 1 << 1,<br>
+ MIPTREE_LAYOUT_GEN6_HIZ_<wbr>STENCIL = 1 << 1,<br>
MIPTREE_LAYOUT_FOR_BO = 1 << 2,<br>
MIPTREE_LAYOUT_DISABLE_AUX = 1 << 3,<br>
MIPTREE_LAYOUT_FORCE_HALIGN16 = 1 << 4,<br>
<span class="HOEnZb"><font color="#888888">--<br>
2.5.0.400.gff86faf<br>
<br>
</font></span></blockquote></div><br></div>