<div dir="ltr"><div><div>Ok, given what you said below, I think everything is sane.<br><br></div>Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br><br></div>Note that I didn't actually verify that all of the code you deleted is going to work once we switch over to NIR handling it with varyings.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Jan 4, 2017 at 2:12 PM, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On Wednesday, January 4, 2017 1:35:55 PM PST Jason Ekstrand wrote:<br>
> On Wed, Jan 4, 2017 at 3:07 AM, Kenneth Graunke <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>><br>
> wrote:<br>
><br>
> > Treating everything as scalar arrays allows us to drop a bunch of<br>
> > special case input/output munging all throughout the backend.<br>
> > Instead, we just need to remap the TessLevel components to the<br>
> > appropriate patch URB header locations in remap_patch_urb_offsets().<br>
> ><br>
> > We also switch to treating the TES input versions of these as ordinary<br>
> > shader inputs rather than system values, as remap_patch_urb_offsets()<br>
> > just makes everything work out without special handling.<br>
> ><br>
> > This regresses one Piglit test:<br>
> > arb_tessellation_shader-large-<wbr>uniforms/GL_TESS_CONTROL_<br>
> > SHADER-array-at-limit<br>
> ><br>
> > The compiler starts promoting the constant arrays assigned to gl_TessLevel*<br>
> > to uniform arrays. Since the shader also has a uniform array that uses<br>
> > the maximum number of uniform components, this puts it over the uniform<br>
> > component limit enforced by the linker. This is arguably a bug in the<br>
> > constant array promotion code (it should avoid pushing us over limits),<br>
> > but is unlikely to penalize any real application.<br>
> ><br>
> > Signed-off-by: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>><br>
> > ---<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>context.c | 2 +-<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>fs_nir.cpp | 164<br>
> > +--------------------<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>nir.c | 74 +++++++++-<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>nir.h | 3 +-<br>
> > .../drivers/dri/i965/brw_nir_<wbr>tcs_workarounds.c | 8 +-<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>shader.cpp | 61 +-------<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>shader.h | 5 -<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>tcs.c | 5 +-<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>tes.c | 17 +--<br>
> > src/mesa/drivers/dri/i965/brw_<wbr>vec4_tcs.cpp | 117 +--------------<br>
> > 10 files changed, 92 insertions(+), 364 deletions(-)<br>
> ><br>
> > diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_context.c<br>
> > b/src/mesa/drivers/dri/i965/<wbr>brw_context.c<br>
> > index 45490a0f5cf..22f872fe782 100644<br>
> > --- a/src/mesa/drivers/dri/i965/<wbr>brw_context.c<br>
> > +++ b/src/mesa/drivers/dri/i965/<wbr>brw_context.c<br>
> > @@ -672,7 +672,7 @@ brw_initialize_context_<wbr>constants(struct brw_context<br>
> > *brw)<br>
> > if (brw->gen >= 5 || brw->is_g4x)<br>
> > ctx->Const.MaxClipPlanes = 8;<br>
> ><br>
> > - ctx->Const.LowerTessLevel = true;<br>
> > + ctx->Const.<wbr>GLSLTessLevelsAsInputs = true;<br>
> > ctx->Const.<wbr>LowerTCSPatchVerticesIn = brw->gen >= 8;<br>
> > ctx->Const.<wbr>LowerTESPatchVerticesIn = true;<br>
> > ctx->Const.<wbr>PrimitiveRestartForPatches = true;<br>
> > diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_fs_nir.cpp<br>
> > b/src/mesa/drivers/dri/i965/<wbr>brw_fs_nir.cpp<br>
> > index 2ed843bd03d..8f745dff440 100644<br>
> > --- a/src/mesa/drivers/dri/i965/<wbr>brw_fs_nir.cpp<br>
> > +++ b/src/mesa/drivers/dri/i965/<wbr>brw_fs_nir.cpp<br>
> > @@ -2520,78 +2520,7 @@ fs_visitor::nir_emit_tcs_<wbr>intrinsic(const<br>
> > fs_builder &bld,<br>
> > bld.MOV(patch_handle,<br>
> > retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UD));<br>
> ><br>
> > - if (imm_offset == 0) {<br>
> > - /* This is a read of gl_TessLevelInner[], which lives in the<br>
> > - * Patch URB header. The layout depends on the domain.<br>
> > - */<br>
> > - dst.type = BRW_REGISTER_TYPE_F;<br>
> > - switch (tcs_key->tes_primitive_mode) {<br>
> > - case GL_QUADS: {<br>
> > - /* DWords 3-2 (reversed) */<br>
> > - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, 4);<br>
> > -<br>
> > - inst = bld.emit(SHADER_OPCODE_URB_<wbr>READ_SIMD8, tmp,<br>
> > patch_handle);<br>
> > - inst->offset = 0;<br>
> > - inst->mlen = 1;<br>
> > - inst->size_written = 4 * REG_SIZE;<br>
> > -<br>
> > - /* dst.xy = tmp.wz */<br>
> > - bld.MOV(dst, offset(tmp, bld, 3));<br>
> > - bld.MOV(offset(dst, bld, 1), offset(tmp, bld, 2));<br>
> > - break;<br>
> > - }<br>
> > - case GL_TRIANGLES:<br>
> > - /* DWord 4; hardcode offset = 1 and size_written =<br>
> > REG_SIZE */<br>
> > - inst = bld.emit(SHADER_OPCODE_URB_<wbr>READ_SIMD8, dst,<br>
> > patch_handle);<br>
> > - inst->offset = 1;<br>
> > - inst->mlen = 1;<br>
> > - inst->size_written = REG_SIZE;<br>
> > - break;<br>
> > - case GL_ISOLINES:<br>
> > - /* All channels are undefined. */<br>
> > - break;<br>
> > - default:<br>
> > - unreachable("Bogus tessellation domain");<br>
> > - }<br>
> > - } else if (imm_offset == 1) {<br>
> > - /* This is a read of gl_TessLevelOuter[], which lives in the<br>
> > - * Patch URB header. The layout depends on the domain.<br>
> > - */<br>
> > - dst.type = BRW_REGISTER_TYPE_F;<br>
> > -<br>
> > - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, 4);<br>
> > - inst = bld.emit(SHADER_OPCODE_URB_<wbr>READ_SIMD8, tmp,<br>
> > patch_handle);<br>
> > - inst->offset = 1;<br>
> > - inst->mlen = 1;<br>
> > - inst->size_written = 4 * REG_SIZE;<br>
> > -<br>
> > - /* Reswizzle: WZYX */<br>
> > - fs_reg srcs[4] = {<br>
> > - offset(tmp, bld, 3),<br>
> > - offset(tmp, bld, 2),<br>
> > - offset(tmp, bld, 1),<br>
> > - offset(tmp, bld, 0),<br>
> > - };<br>
> > -<br>
> > - unsigned num_components;<br>
> > - switch (tcs_key->tes_primitive_mode) {<br>
> > - case GL_QUADS:<br>
> > - num_components = 4;<br>
> > - break;<br>
> > - case GL_TRIANGLES:<br>
> > - num_components = 3;<br>
> > - break;<br>
> > - case GL_ISOLINES:<br>
> > - /* Isolines are not reversed; swizzle .zw -> .xy */<br>
> > - srcs[0] = offset(tmp, bld, 2);<br>
> > - srcs[1] = offset(tmp, bld, 3);<br>
> > - num_components = 2;<br>
> > - break;<br>
> > - default:<br>
> > - unreachable("Bogus tessellation domain");<br>
> > - }<br>
> > - bld.LOAD_PAYLOAD(dst, srcs, num_components, 0);<br>
> > - } else {<br>
> > + {<br>
> > if (first_component != 0) {<br>
> > unsigned read_components =<br>
> > instr->num_components + first_component;<br>
> > @@ -2656,55 +2585,6 @@ fs_visitor::nir_emit_tcs_<wbr>intrinsic(const<br>
> > fs_builder &bld,<br>
> ><br>
> > if (indirect_offset.file != BAD_FILE) {<br>
> > srcs[header_regs++] = indirect_offset;<br>
> > - } else if (!is_passthrough_shader) {<br>
> > - if (imm_offset == 0) {<br>
> > - value.type = BRW_REGISTER_TYPE_F;<br>
> > -<br>
> > - mask &= (1 << tesslevel_inner_components(<wbr>tcs_key->tes_primitive_mode))<br>
> > - 1;<br>
> > -<br>
> > - /* This is a write to gl_TessLevelInner[], which lives in the<br>
> > - * Patch URB header. The layout depends on the domain.<br>
> > - */<br>
> > - switch (tcs_key->tes_primitive_mode) {<br>
> > - case GL_QUADS:<br>
> > - /* gl_TessLevelInner[].xy lives at DWords 3-2 (reversed).<br>
> > - * We use an XXYX swizzle to reverse put .xy in the .wz<br>
> > - * channels, and use a .zw writemask.<br>
> > - */<br>
> > - mask = writemask_for_backwards_<wbr>vector(mask);<br>
> > - swiz = BRW_SWIZZLE4(0, 0, 1, 0);<br>
> > - break;<br>
> > - case GL_TRIANGLES:<br>
> > - /* gl_TessLevelInner[].x lives at DWord 4, so we set the<br>
> > - * writemask to X and bump the URB offset by 1.<br>
> > - */<br>
> > - imm_offset = 1;<br>
> > - break;<br>
> > - case GL_ISOLINES:<br>
> > - /* Skip; gl_TessLevelInner[] doesn't exist for isolines. */<br>
> > - return;<br>
> > - default:<br>
> > - unreachable("Bogus tessellation domain");<br>
> > - }<br>
> > - } else if (imm_offset == 1) {<br>
> > - /* This is a write to gl_TessLevelOuter[] which lives in the<br>
> > - * Patch URB Header at DWords 4-7. However, it's reversed, so<br>
> > - * instead of .xyzw we have .wzyx.<br>
> > - */<br>
> > - value.type = BRW_REGISTER_TYPE_F;<br>
> > -<br>
> > - mask &= (1 << tesslevel_outer_components(<wbr>tcs_key->tes_primitive_mode))<br>
> > - 1;<br>
> > -<br>
> > - if (tcs_key->tes_primitive_mode == GL_ISOLINES) {<br>
> > - /* Isolines .xy should be stored in .zw, in order. */<br>
> > - swiz = BRW_SWIZZLE4(0, 0, 0, 1);<br>
> > - mask <<= 2;<br>
> > - } else {<br>
> > - /* Other domains are reversed; store .wzyx instead of<br>
> > .xyzw */<br>
> > - swiz = BRW_SWIZZLE_WZYX;<br>
> > - mask = writemask_for_backwards_<wbr>vector(mask);<br>
> > - }<br>
> > - }<br>
> > }<br>
> ><br>
> > if (mask == 0)<br>
> > @@ -2851,48 +2731,6 @@ fs_visitor::nir_emit_tes_<wbr>intrinsic(const<br>
> > fs_builder &bld,<br>
> > }<br>
> > break;<br>
> ><br>
> > - case nir_intrinsic_load_tess_level_<wbr>outer:<br>
> > - /* When the TES reads gl_TessLevelOuter, we ensure that the patch<br>
> > header<br>
> > - * appears as a push-model input. So, we can simply use the ATTR<br>
> > file<br>
> > - * rather than issuing URB read messages. The data is stored in the<br>
> > - * high DWords in reverse order - DWord 7 contains .x, DWord 6<br>
> > contains<br>
> > - * .y, and so on.<br>
> > - */<br>
> > - switch (tes_prog_data->domain) {<br>
> > - case BRW_TESS_DOMAIN_QUAD:<br>
> > - for (unsigned i = 0; i < 4; i++)<br>
> > - bld.MOV(offset(dest, bld, i), component(fs_reg(ATTR, 0), 7 -<br>
> > i));<br>
> > - break;<br>
> > - case BRW_TESS_DOMAIN_TRI:<br>
> > - for (unsigned i = 0; i < 3; i++)<br>
> > - bld.MOV(offset(dest, bld, i), component(fs_reg(ATTR, 0), 7 -<br>
> > i));<br>
> > - break;<br>
> > - case BRW_TESS_DOMAIN_ISOLINE:<br>
> > - for (unsigned i = 0; i < 2; i++)<br>
> > - bld.MOV(offset(dest, bld, i), component(fs_reg(ATTR, 0), 6 +<br>
> > i));<br>
> > - break;<br>
> > - }<br>
> > - break;<br>
> > -<br>
> > - case nir_intrinsic_load_tess_level_<wbr>inner:<br>
> > - /* When the TES reads gl_TessLevelInner, we ensure that the patch<br>
> > header<br>
> > - * appears as a push-model input. So, we can simply use the ATTR<br>
> > file<br>
> > - * rather than issuing URB read messages.<br>
> > - */<br>
> > - switch (tes_prog_data->domain) {<br>
> > - case BRW_TESS_DOMAIN_QUAD:<br>
> > - bld.MOV(dest, component(fs_reg(ATTR, 0), 3));<br>
> > - bld.MOV(offset(dest, bld, 1), component(fs_reg(ATTR, 0), 2));<br>
> > - break;<br>
> > - case BRW_TESS_DOMAIN_TRI:<br>
> > - bld.MOV(dest, component(fs_reg(ATTR, 0), 4));<br>
> > - break;<br>
> > - case BRW_TESS_DOMAIN_ISOLINE:<br>
> > - /* ignore - value is undefined */<br>
> > - break;<br>
> > - }<br>
> > - break;<br>
> > -<br>
> > case nir_intrinsic_load_input:<br>
> > case nir_intrinsic_load_per_vertex_<wbr>input: {<br>
> > fs_reg indirect_offset = get_indirect_offset(instr);<br>
> > diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_nir.c<br>
> > b/src/mesa/drivers/dri/i965/<wbr>brw_nir.c<br>
> > index 6f37e97a86f..46eeb1723b4 100644<br>
> > --- a/src/mesa/drivers/dri/i965/<wbr>brw_nir.c<br>
> > +++ b/src/mesa/drivers/dri/i965/<wbr>brw_nir.c<br>
> > @@ -141,9 +141,68 @@ remap_inputs_with_vue_map(nir_<wbr>block *block, const<br>
> > struct brw_vue_map *vue_map)<br>
> > }<br>
> ><br>
> > static bool<br>
> > +remap_tess_levels(nir_builder *b, nir_intrinsic_instr *intr,<br>
> > + GLenum primitive_mode)<br>
> > +{<br>
> > + const int location = nir_intrinsic_base(intr);<br>
> > + const unsigned component = nir_intrinsic_component(intr);<br>
> > + bool out_of_bounds;<br>
> > +<br>
> > + if (location == VARYING_SLOT_TESS_LEVEL_INNER) {<br>
> > + switch (primitive_mode) {<br>
> > + case GL_QUADS:<br>
> > + /* gl_TessLevelInner[0..1] lives at DWords 3-2 (reversed). */<br>
> > + nir_intrinsic_set_base(intr, 0);<br>
> > + nir_intrinsic_set_component(<wbr>intr, 3 - component);<br>
> > + out_of_bounds = false;<br>
> ><br>
><br>
> What if component > 1? I guess that's not really a problem but it is<br>
> out-of-bounds...<br>
<br>
</div></div>The GLSL array is a float[2], so I guess I was assuming something would<br>
reject the out of bounds array access higher up in the compiler.<br>
<br>
Maybe that isn't a valid assumption though...<br>
<div><div class="h5"><br>
> > + break;<br>
> > + case GL_TRIANGLES:<br>
> > + /* gl_TessLevelInner[0] lives at DWord 4. */<br>
> > + nir_intrinsic_set_base(intr, 1);<br>
> > + out_of_bounds = component > 0;<br>
> > + break;<br>
> > + case GL_ISOLINES:<br>
> > + out_of_bounds = true;<br>
> > + break;<br>
> > + default:<br>
> > + unreachable("Bogus tessellation domain");<br>
> > + }<br>
> > + } else if (location == VARYING_SLOT_TESS_LEVEL_OUTER) {<br>
> > + if (primitive_mode == GL_ISOLINES) {<br>
> > + /* gl_TessLevelOuter[0..1] lives at DWords 6-7 (in order). */<br>
> > + nir_intrinsic_set_base(intr, 1);<br>
> > + nir_intrinsic_set_component(<wbr>intr, 2 +<br>
> > nir_intrinsic_component(intr))<wbr>;<br>
> > + out_of_bounds = component > 1;<br>
> > + } else {<br>
> > + /* Triangles use DWords 7-5 (reversed); Quads use 7-4 (reversed)<br>
> > */<br>
> > + nir_intrinsic_set_base(intr, 1);<br>
> > + nir_intrinsic_set_component(<wbr>intr, 3 -<br>
> > nir_intrinsic_component(intr))<wbr>;<br>
> > + out_of_bounds = component == 3 && primitive_mode == GL_TRIANGLES;<br>
> ><br>
> + }<br>
> > + } else {<br>
> > + return false;<br>
> > + }<br>
> > +<br>
> > + if (out_of_bounds) {<br>
> > + if (nir_intrinsic_infos[intr-><wbr>intrinsic].has_dest) {<br>
> > + b->cursor = nir_before_instr(&intr->instr)<wbr>;<br>
> > + nir_ssa_def *undef = nir_ssa_undef(b, 1, 32);<br>
> > + nir_ssa_def_rewrite_uses(&<wbr>intr->dest.ssa,<br>
> > nir_src_for_ssa(undef));<br>
> > + }<br>
> > + nir_instr_remove(&intr->instr)<wbr>;<br>
> > + }<br>
> > +<br>
> > + return true;<br>
> > +}<br>
> > +<br>
> > +static bool<br>
> > remap_patch_urb_offsets(nir_<wbr>block *block, nir_builder *b,<br>
> > - const struct brw_vue_map *vue_map)<br>
> > + const struct brw_vue_map *vue_map,<br>
> > + GLenum tes_primitive_mode)<br>
> > {<br>
> > + const bool is_passthrough_tcs = b->shader->info->name &&<br>
> > + strcmp(b->shader->info->name, "passthrough") == 0;<br>
> ><br>
><br>
> This is gross...<br>
><br>
> Also... Why? What's so special about the passthrough that it doesn't need<br>
> tess level remaps? I have a feeling there's some more general thing we<br>
> could be doing here.<br>
<br>
</div></div>We've always been special casing it, so I'm not sure it's more gross<br>
than it was before...*shrug*.<br>
<br>
All TCS must write the patch header, which is two vec4 slots:<br>
<br>
- gl_TessLevelOuter[4]<br>
- gl_TessLevelInner[2]<br>
- TR DS Cache Disable bit (DWord 0)<br>
<br>
For the passthrough shader, the inner/outer levels come from API state.<br>
So, I hooked up two built-in vec4 uniforms, and made their contents<br>
match the patch URB header exactly. So, the shader doesn't have to<br>
do any remapping or shuffling...it just straight copies both vec4s.<br>
Simpler and more efficient. But...a special case.<br>
<span class=""><br>
> > +<br>
> > nir_foreach_instr_safe(instr, block) {<br>
> > if (instr->type != nir_instr_type_intrinsic)<br>
> > continue;<br>
> > @@ -154,6 +213,11 @@ remap_patch_urb_offsets(nir_<wbr>block *block,<br>
> > nir_builder *b,<br>
> ><br>
> > if ((stage == MESA_SHADER_TESS_CTRL && is_output(intrin)) ||<br>
> > (stage == MESA_SHADER_TESS_EVAL && is_input(intrin))) {<br>
> > +<br>
> > + if (!is_passthrough_tcs &&<br>
> > + remap_tess_levels(b, intrin, tes_primitive_mode))<br>
> > + continue;<br>
> ><br>
><br>
> Let's make sure I've got this right...<br>
><br>
> We map everything from the varying identifiers to VUE slots. For the case<br>
> of tesslevel, they will be assigned VUE slots 0 and 1 so the code below<br>
> will hever cause any other output to alias. Then we rework stuff in<br>
> remap_tess_levels so that they map to the right locaion in 8-dword chunk at<br>
> the begining of the VUE.<br>
<br>
</span>Yep, exactly. I assigned VARYING_SLOT_TESS_LEVEL_INNER/<wbr>OUTER to slots 0<br>
and 1, since both together live in those slots. (Which is which doesn't<br>
actually matter.) Here we special case those and take care of mapping<br>
the individual components to the right spot.<br>
<span class=""><br>
> This function (remap_patch_urb_offsets) could really use a comment at the<br>
> top saying what it's doing.<br>
<br>
</span>I suppose so. It makes sense to me, but I wrote it...<br>
</blockquote></div><br></div>