[Mesa-dev] [PATCH 8/8] i965: Rework gl_TessLevel*[] handling to use NIR compact arrays.
Jason Ekstrand
jason at jlekstrand.net
Wed Jan 4 22:57:14 UTC 2017
Ok, given what you said below, I think everything is sane.
Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>
Note that I didn't actually verify that all of the code you deleted is
going to work once we switch over to NIR handling it with varyings.
On Wed, Jan 4, 2017 at 2:12 PM, Kenneth Graunke <kenneth at whitecape.org>
wrote:
> On Wednesday, January 4, 2017 1:35:55 PM PST Jason Ekstrand wrote:
> > On Wed, Jan 4, 2017 at 3:07 AM, Kenneth Graunke <kenneth at whitecape.org>
> > wrote:
> >
> > > Treating everything as scalar arrays allows us to drop a bunch of
> > > special case input/output munging all throughout the backend.
> > > Instead, we just need to remap the TessLevel components to the
> > > appropriate patch URB header locations in remap_patch_urb_offsets().
> > >
> > > We also switch to treating the TES input versions of these as ordinary
> > > shader inputs rather than system values, as remap_patch_urb_offsets()
> > > just makes everything work out without special handling.
> > >
> > > This regresses one Piglit test:
> > > arb_tessellation_shader-large-uniforms/GL_TESS_CONTROL_
> > > SHADER-array-at-limit
> > >
> > > The compiler starts promoting the constant arrays assigned to
> gl_TessLevel*
> > > to uniform arrays. Since the shader also has a uniform array that uses
> > > the maximum number of uniform components, this puts it over the uniform
> > > component limit enforced by the linker. This is arguably a bug in the
> > > constant array promotion code (it should avoid pushing us over limits),
> > > but is unlikely to penalize any real application.
> > >
> > > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > > ---
> > > src/mesa/drivers/dri/i965/brw_context.c | 2 +-
> > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 164
> > > +--------------------
> > > src/mesa/drivers/dri/i965/brw_nir.c | 74 +++++++++-
> > > src/mesa/drivers/dri/i965/brw_nir.h | 3 +-
> > > .../drivers/dri/i965/brw_nir_tcs_workarounds.c | 8 +-
> > > src/mesa/drivers/dri/i965/brw_shader.cpp | 61 +-------
> > > src/mesa/drivers/dri/i965/brw_shader.h | 5 -
> > > src/mesa/drivers/dri/i965/brw_tcs.c | 5 +-
> > > src/mesa/drivers/dri/i965/brw_tes.c | 17 +--
> > > src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 117
> +--------------
> > > 10 files changed, 92 insertions(+), 364 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c
> > > b/src/mesa/drivers/dri/i965/brw_context.c
> > > index 45490a0f5cf..22f872fe782 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_context.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_context.c
> > > @@ -672,7 +672,7 @@ brw_initialize_context_constants(struct
> brw_context
> > > *brw)
> > > if (brw->gen >= 5 || brw->is_g4x)
> > > ctx->Const.MaxClipPlanes = 8;
> > >
> > > - ctx->Const.LowerTessLevel = true;
> > > + ctx->Const.GLSLTessLevelsAsInputs = true;
> > > ctx->Const.LowerTCSPatchVerticesIn = brw->gen >= 8;
> > > ctx->Const.LowerTESPatchVerticesIn = true;
> > > ctx->Const.PrimitiveRestartForPatches = true;
> > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > index 2ed843bd03d..8f745dff440 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > > @@ -2520,78 +2520,7 @@ fs_visitor::nir_emit_tcs_intrinsic(const
> > > fs_builder &bld,
> > > bld.MOV(patch_handle,
> > > retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UD));
> > >
> > > - if (imm_offset == 0) {
> > > - /* This is a read of gl_TessLevelInner[], which lives in
> the
> > > - * Patch URB header. The layout depends on the domain.
> > > - */
> > > - dst.type = BRW_REGISTER_TYPE_F;
> > > - switch (tcs_key->tes_primitive_mode) {
> > > - case GL_QUADS: {
> > > - /* DWords 3-2 (reversed) */
> > > - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
> > > -
> > > - inst = bld.emit(SHADER_OPCODE_URB_READ_SIMD8, tmp,
> > > patch_handle);
> > > - inst->offset = 0;
> > > - inst->mlen = 1;
> > > - inst->size_written = 4 * REG_SIZE;
> > > -
> > > - /* dst.xy = tmp.wz */
> > > - bld.MOV(dst, offset(tmp, bld, 3));
> > > - bld.MOV(offset(dst, bld, 1), offset(tmp, bld, 2));
> > > - break;
> > > - }
> > > - case GL_TRIANGLES:
> > > - /* DWord 4; hardcode offset = 1 and size_written =
> > > REG_SIZE */
> > > - inst = bld.emit(SHADER_OPCODE_URB_READ_SIMD8, dst,
> > > patch_handle);
> > > - inst->offset = 1;
> > > - inst->mlen = 1;
> > > - inst->size_written = REG_SIZE;
> > > - break;
> > > - case GL_ISOLINES:
> > > - /* All channels are undefined. */
> > > - break;
> > > - default:
> > > - unreachable("Bogus tessellation domain");
> > > - }
> > > - } else if (imm_offset == 1) {
> > > - /* This is a read of gl_TessLevelOuter[], which lives in
> the
> > > - * Patch URB header. The layout depends on the domain.
> > > - */
> > > - dst.type = BRW_REGISTER_TYPE_F;
> > > -
> > > - fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
> > > - inst = bld.emit(SHADER_OPCODE_URB_READ_SIMD8, tmp,
> > > patch_handle);
> > > - inst->offset = 1;
> > > - inst->mlen = 1;
> > > - inst->size_written = 4 * REG_SIZE;
> > > -
> > > - /* Reswizzle: WZYX */
> > > - fs_reg srcs[4] = {
> > > - offset(tmp, bld, 3),
> > > - offset(tmp, bld, 2),
> > > - offset(tmp, bld, 1),
> > > - offset(tmp, bld, 0),
> > > - };
> > > -
> > > - unsigned num_components;
> > > - switch (tcs_key->tes_primitive_mode) {
> > > - case GL_QUADS:
> > > - num_components = 4;
> > > - break;
> > > - case GL_TRIANGLES:
> > > - num_components = 3;
> > > - break;
> > > - case GL_ISOLINES:
> > > - /* Isolines are not reversed; swizzle .zw -> .xy */
> > > - srcs[0] = offset(tmp, bld, 2);
> > > - srcs[1] = offset(tmp, bld, 3);
> > > - num_components = 2;
> > > - break;
> > > - default:
> > > - unreachable("Bogus tessellation domain");
> > > - }
> > > - bld.LOAD_PAYLOAD(dst, srcs, num_components, 0);
> > > - } else {
> > > + {
> > > if (first_component != 0) {
> > > unsigned read_components =
> > > instr->num_components + first_component;
> > > @@ -2656,55 +2585,6 @@ fs_visitor::nir_emit_tcs_intrinsic(const
> > > fs_builder &bld,
> > >
> > > if (indirect_offset.file != BAD_FILE) {
> > > srcs[header_regs++] = indirect_offset;
> > > - } else if (!is_passthrough_shader) {
> > > - if (imm_offset == 0) {
> > > - value.type = BRW_REGISTER_TYPE_F;
> > > -
> > > - mask &= (1 << tesslevel_inner_components(
> tcs_key->tes_primitive_mode))
> > > - 1;
> > > -
> > > - /* This is a write to gl_TessLevelInner[], which lives in
> the
> > > - * Patch URB header. The layout depends on the domain.
> > > - */
> > > - switch (tcs_key->tes_primitive_mode) {
> > > - case GL_QUADS:
> > > - /* gl_TessLevelInner[].xy lives at DWords 3-2
> (reversed).
> > > - * We use an XXYX swizzle to reverse put .xy in the .wz
> > > - * channels, and use a .zw writemask.
> > > - */
> > > - mask = writemask_for_backwards_vector(mask);
> > > - swiz = BRW_SWIZZLE4(0, 0, 1, 0);
> > > - break;
> > > - case GL_TRIANGLES:
> > > - /* gl_TessLevelInner[].x lives at DWord 4, so we set
> the
> > > - * writemask to X and bump the URB offset by 1.
> > > - */
> > > - imm_offset = 1;
> > > - break;
> > > - case GL_ISOLINES:
> > > - /* Skip; gl_TessLevelInner[] doesn't exist for
> isolines. */
> > > - return;
> > > - default:
> > > - unreachable("Bogus tessellation domain");
> > > - }
> > > - } else if (imm_offset == 1) {
> > > - /* This is a write to gl_TessLevelOuter[] which lives in
> the
> > > - * Patch URB Header at DWords 4-7. However, it's
> reversed, so
> > > - * instead of .xyzw we have .wzyx.
> > > - */
> > > - value.type = BRW_REGISTER_TYPE_F;
> > > -
> > > - mask &= (1 << tesslevel_outer_components(
> tcs_key->tes_primitive_mode))
> > > - 1;
> > > -
> > > - if (tcs_key->tes_primitive_mode == GL_ISOLINES) {
> > > - /* Isolines .xy should be stored in .zw, in order. */
> > > - swiz = BRW_SWIZZLE4(0, 0, 0, 1);
> > > - mask <<= 2;
> > > - } else {
> > > - /* Other domains are reversed; store .wzyx instead of
> > > .xyzw */
> > > - swiz = BRW_SWIZZLE_WZYX;
> > > - mask = writemask_for_backwards_vector(mask);
> > > - }
> > > - }
> > > }
> > >
> > > if (mask == 0)
> > > @@ -2851,48 +2731,6 @@ fs_visitor::nir_emit_tes_intrinsic(const
> > > fs_builder &bld,
> > > }
> > > break;
> > >
> > > - case nir_intrinsic_load_tess_level_outer:
> > > - /* When the TES reads gl_TessLevelOuter, we ensure that the
> patch
> > > header
> > > - * appears as a push-model input. So, we can simply use the
> ATTR
> > > file
> > > - * rather than issuing URB read messages. The data is stored
> in the
> > > - * high DWords in reverse order - DWord 7 contains .x, DWord 6
> > > contains
> > > - * .y, and so on.
> > > - */
> > > - switch (tes_prog_data->domain) {
> > > - case BRW_TESS_DOMAIN_QUAD:
> > > - for (unsigned i = 0; i < 4; i++)
> > > - bld.MOV(offset(dest, bld, i), component(fs_reg(ATTR, 0),
> 7 -
> > > i));
> > > - break;
> > > - case BRW_TESS_DOMAIN_TRI:
> > > - for (unsigned i = 0; i < 3; i++)
> > > - bld.MOV(offset(dest, bld, i), component(fs_reg(ATTR, 0),
> 7 -
> > > i));
> > > - break;
> > > - case BRW_TESS_DOMAIN_ISOLINE:
> > > - for (unsigned i = 0; i < 2; i++)
> > > - bld.MOV(offset(dest, bld, i), component(fs_reg(ATTR, 0),
> 6 +
> > > i));
> > > - break;
> > > - }
> > > - break;
> > > -
> > > - case nir_intrinsic_load_tess_level_inner:
> > > - /* When the TES reads gl_TessLevelInner, we ensure that the
> patch
> > > header
> > > - * appears as a push-model input. So, we can simply use the
> ATTR
> > > file
> > > - * rather than issuing URB read messages.
> > > - */
> > > - switch (tes_prog_data->domain) {
> > > - case BRW_TESS_DOMAIN_QUAD:
> > > - bld.MOV(dest, component(fs_reg(ATTR, 0), 3));
> > > - bld.MOV(offset(dest, bld, 1), component(fs_reg(ATTR, 0), 2));
> > > - break;
> > > - case BRW_TESS_DOMAIN_TRI:
> > > - bld.MOV(dest, component(fs_reg(ATTR, 0), 4));
> > > - break;
> > > - case BRW_TESS_DOMAIN_ISOLINE:
> > > - /* ignore - value is undefined */
> > > - break;
> > > - }
> > > - break;
> > > -
> > > case nir_intrinsic_load_input:
> > > case nir_intrinsic_load_per_vertex_input: {
> > > fs_reg indirect_offset = get_indirect_offset(instr);
> > > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
> > > b/src/mesa/drivers/dri/i965/brw_nir.c
> > > index 6f37e97a86f..46eeb1723b4 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_nir.c
> > > +++ b/src/mesa/drivers/dri/i965/brw_nir.c
> > > @@ -141,9 +141,68 @@ remap_inputs_with_vue_map(nir_block *block, const
> > > struct brw_vue_map *vue_map)
> > > }
> > >
> > > static bool
> > > +remap_tess_levels(nir_builder *b, nir_intrinsic_instr *intr,
> > > + GLenum primitive_mode)
> > > +{
> > > + const int location = nir_intrinsic_base(intr);
> > > + const unsigned component = nir_intrinsic_component(intr);
> > > + bool out_of_bounds;
> > > +
> > > + if (location == VARYING_SLOT_TESS_LEVEL_INNER) {
> > > + switch (primitive_mode) {
> > > + case GL_QUADS:
> > > + /* gl_TessLevelInner[0..1] lives at DWords 3-2 (reversed). */
> > > + nir_intrinsic_set_base(intr, 0);
> > > + nir_intrinsic_set_component(intr, 3 - component);
> > > + out_of_bounds = false;
> > >
> >
> > What if component > 1? I guess that's not really a problem but it is
> > out-of-bounds...
>
> The GLSL array is a float[2], so I guess I was assuming something would
> reject the out of bounds array access higher up in the compiler.
>
> Maybe that isn't a valid assumption though...
>
> > > + break;
> > > + case GL_TRIANGLES:
> > > + /* gl_TessLevelInner[0] lives at DWord 4. */
> > > + nir_intrinsic_set_base(intr, 1);
> > > + out_of_bounds = component > 0;
> > > + break;
> > > + case GL_ISOLINES:
> > > + out_of_bounds = true;
> > > + break;
> > > + default:
> > > + unreachable("Bogus tessellation domain");
> > > + }
> > > + } else if (location == VARYING_SLOT_TESS_LEVEL_OUTER) {
> > > + if (primitive_mode == GL_ISOLINES) {
> > > + /* gl_TessLevelOuter[0..1] lives at DWords 6-7 (in order). */
> > > + nir_intrinsic_set_base(intr, 1);
> > > + nir_intrinsic_set_component(intr, 2 +
> > > nir_intrinsic_component(intr));
> > > + out_of_bounds = component > 1;
> > > + } else {
> > > + /* Triangles use DWords 7-5 (reversed); Quads use 7-4
> (reversed)
> > > */
> > > + nir_intrinsic_set_base(intr, 1);
> > > + nir_intrinsic_set_component(intr, 3 -
> > > nir_intrinsic_component(intr));
> > > + out_of_bounds = component == 3 && primitive_mode ==
> GL_TRIANGLES;
> > >
> > + }
> > > + } else {
> > > + return false;
> > > + }
> > > +
> > > + if (out_of_bounds) {
> > > + if (nir_intrinsic_infos[intr->intrinsic].has_dest) {
> > > + b->cursor = nir_before_instr(&intr->instr);
> > > + nir_ssa_def *undef = nir_ssa_undef(b, 1, 32);
> > > + nir_ssa_def_rewrite_uses(&intr->dest.ssa,
> > > nir_src_for_ssa(undef));
> > > + }
> > > + nir_instr_remove(&intr->instr);
> > > + }
> > > +
> > > + return true;
> > > +}
> > > +
> > > +static bool
> > > remap_patch_urb_offsets(nir_block *block, nir_builder *b,
> > > - const struct brw_vue_map *vue_map)
> > > + const struct brw_vue_map *vue_map,
> > > + GLenum tes_primitive_mode)
> > > {
> > > + const bool is_passthrough_tcs = b->shader->info->name &&
> > > + strcmp(b->shader->info->name, "passthrough") == 0;
> > >
> >
> > This is gross...
> >
> > Also... Why? What's so special about the passthrough that it doesn't
> need
> > tess level remaps? I have a feeling there's some more general thing we
> > could be doing here.
>
> We've always been special casing it, so I'm not sure it's more gross
> than it was before...*shrug*.
>
> All TCS must write the patch header, which is two vec4 slots:
>
> - gl_TessLevelOuter[4]
> - gl_TessLevelInner[2]
> - TR DS Cache Disable bit (DWord 0)
>
> For the passthrough shader, the inner/outer levels come from API state.
> So, I hooked up two built-in vec4 uniforms, and made their contents
> match the patch URB header exactly. So, the shader doesn't have to
> do any remapping or shuffling...it just straight copies both vec4s.
> Simpler and more efficient. But...a special case.
>
> > > +
> > > nir_foreach_instr_safe(instr, block) {
> > > if (instr->type != nir_instr_type_intrinsic)
> > > continue;
> > > @@ -154,6 +213,11 @@ remap_patch_urb_offsets(nir_block *block,
> > > nir_builder *b,
> > >
> > > if ((stage == MESA_SHADER_TESS_CTRL && is_output(intrin)) ||
> > > (stage == MESA_SHADER_TESS_EVAL && is_input(intrin))) {
> > > +
> > > + if (!is_passthrough_tcs &&
> > > + remap_tess_levels(b, intrin, tes_primitive_mode))
> > > + continue;
> > >
> >
> > Let's make sure I've got this right...
> >
> > We map everything from the varying identifiers to VUE slots. For the
> case
> > of tesslevel, they will be assigned VUE slots 0 and 1 so the code below
> > will hever cause any other output to alias. Then we rework stuff in
> > remap_tess_levels so that they map to the right locaion in 8-dword chunk
> at
> > the begining of the VUE.
>
> Yep, exactly. I assigned VARYING_SLOT_TESS_LEVEL_INNER/OUTER to slots 0
> and 1, since both together live in those slots. (Which is which doesn't
> actually matter.) Here we special case those and take care of mapping
> the individual components to the right spot.
>
> > This function (remap_patch_urb_offsets) could really use a comment at the
> > top saying what it's doing.
>
> I suppose so. It makes sense to me, but I wrote it...
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170104/49325cd7/attachment-0001.html>
More information about the mesa-dev
mailing list