[Mesa-dev] [PATCH 8/8] i965: Rework gl_TessLevel*[] handling to use NIR compact arrays.

Kenneth Graunke kenneth at whitecape.org
Wed Jan 4 22:12:21 UTC 2017


On Wednesday, January 4, 2017 1:35:55 PM PST Jason Ekstrand wrote:
> On Wed, Jan 4, 2017 at 3:07 AM, Kenneth Graunke <kenneth at whitecape.org>
> wrote:
> 
> > Treating everything as scalar arrays allows us to drop a bunch of
> > special case input/output munging all throughout the backend.
> > Instead, we just need to remap the TessLevel components to the
> > appropriate patch URB header locations in remap_patch_urb_offsets().
> >
> > We also switch to treating the TES input versions of these as ordinary
> > shader inputs rather than system values, as remap_patch_urb_offsets()
> > just makes everything work out without special handling.
> >
> > This regresses one Piglit test:
> > arb_tessellation_shader-large-uniforms/GL_TESS_CONTROL_
> > SHADER-array-at-limit
> >
> > The compiler starts promoting the constant arrays assigned to gl_TessLevel*
> > to uniform arrays.  Since the shader also has a uniform array that uses
> > the maximum number of uniform components, this puts it over the uniform
> > component limit enforced by the linker.  This is arguably a bug in the
> > constant array promotion code (it should avoid pushing us over limits),
> > but is unlikely to penalize any real application.
> >
> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.c            |   2 +-
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp           | 164
> > +--------------------
> >  src/mesa/drivers/dri/i965/brw_nir.c                |  74 +++++++++-
> >  src/mesa/drivers/dri/i965/brw_nir.h                |   3 +-
> >  .../drivers/dri/i965/brw_nir_tcs_workarounds.c     |   8 +-
> >  src/mesa/drivers/dri/i965/brw_shader.cpp           |  61 +-------
> >  src/mesa/drivers/dri/i965/brw_shader.h             |   5 -
> >  src/mesa/drivers/dri/i965/brw_tcs.c                |   5 +-
> >  src/mesa/drivers/dri/i965/brw_tes.c                |  17 +--
> >  src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp         | 117 +--------------
> >  10 files changed, 92 insertions(+), 364 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.c
> > b/src/mesa/drivers/dri/i965/brw_context.c
> > index 45490a0f5cf..22f872fe782 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.c
> > +++ b/src/mesa/drivers/dri/i965/brw_context.c
> > @@ -672,7 +672,7 @@ brw_initialize_context_constants(struct brw_context
> > *brw)
> >     if (brw->gen >= 5 || brw->is_g4x)
> >        ctx->Const.MaxClipPlanes = 8;
> >
> > -   ctx->Const.LowerTessLevel = true;
> > +   ctx->Const.GLSLTessLevelsAsInputs = true;
> >     ctx->Const.LowerTCSPatchVerticesIn = brw->gen >= 8;
> >     ctx->Const.LowerTESPatchVerticesIn = true;
> >     ctx->Const.PrimitiveRestartForPatches = true;
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > index 2ed843bd03d..8f745dff440 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > @@ -2520,78 +2520,7 @@ fs_visitor::nir_emit_tcs_intrinsic(const
> > fs_builder &bld,
> >           bld.MOV(patch_handle,
> >                   retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UD));
> >
> > -         if (imm_offset == 0) {
> > -            /* This is a read of gl_TessLevelInner[], which lives in the
> > -             * Patch URB header.  The layout depends on the domain.
> > -             */
> > -            dst.type = BRW_REGISTER_TYPE_F;
> > -            switch (tcs_key->tes_primitive_mode) {
> > -            case GL_QUADS: {
> > -               /* DWords 3-2 (reversed) */
> > -               fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
> > -
> > -               inst = bld.emit(SHADER_OPCODE_URB_READ_SIMD8, tmp,
> > patch_handle);
> > -               inst->offset = 0;
> > -               inst->mlen = 1;
> > -               inst->size_written = 4 * REG_SIZE;
> > -
> > -               /* dst.xy = tmp.wz */
> > -               bld.MOV(dst,                 offset(tmp, bld, 3));
> > -               bld.MOV(offset(dst, bld, 1), offset(tmp, bld, 2));
> > -               break;
> > -            }
> > -            case GL_TRIANGLES:
> > -               /* DWord 4; hardcode offset = 1 and size_written =
> > REG_SIZE */
> > -               inst = bld.emit(SHADER_OPCODE_URB_READ_SIMD8, dst,
> > patch_handle);
> > -               inst->offset = 1;
> > -               inst->mlen = 1;
> > -               inst->size_written = REG_SIZE;
> > -               break;
> > -            case GL_ISOLINES:
> > -               /* All channels are undefined. */
> > -               break;
> > -            default:
> > -               unreachable("Bogus tessellation domain");
> > -            }
> > -         } else if (imm_offset == 1) {
> > -            /* This is a read of gl_TessLevelOuter[], which lives in the
> > -             * Patch URB header.  The layout depends on the domain.
> > -             */
> > -            dst.type = BRW_REGISTER_TYPE_F;
> > -
> > -            fs_reg tmp = bld.vgrf(BRW_REGISTER_TYPE_F, 4);
> > -            inst = bld.emit(SHADER_OPCODE_URB_READ_SIMD8, tmp,
> > patch_handle);
> > -            inst->offset = 1;
> > -            inst->mlen = 1;
> > -            inst->size_written = 4 * REG_SIZE;
> > -
> > -            /* Reswizzle: WZYX */
> > -            fs_reg srcs[4] = {
> > -               offset(tmp, bld, 3),
> > -               offset(tmp, bld, 2),
> > -               offset(tmp, bld, 1),
> > -               offset(tmp, bld, 0),
> > -            };
> > -
> > -            unsigned num_components;
> > -            switch (tcs_key->tes_primitive_mode) {
> > -            case GL_QUADS:
> > -               num_components = 4;
> > -               break;
> > -            case GL_TRIANGLES:
> > -               num_components = 3;
> > -               break;
> > -            case GL_ISOLINES:
> > -               /* Isolines are not reversed; swizzle .zw -> .xy */
> > -               srcs[0] = offset(tmp, bld, 2);
> > -               srcs[1] = offset(tmp, bld, 3);
> > -               num_components = 2;
> > -               break;
> > -            default:
> > -               unreachable("Bogus tessellation domain");
> > -            }
> > -            bld.LOAD_PAYLOAD(dst, srcs, num_components, 0);
> > -         } else {
> > +         {
> >              if (first_component != 0) {
> >                 unsigned read_components =
> >                    instr->num_components + first_component;
> > @@ -2656,55 +2585,6 @@ fs_visitor::nir_emit_tcs_intrinsic(const
> > fs_builder &bld,
> >
> >        if (indirect_offset.file != BAD_FILE) {
> >           srcs[header_regs++] = indirect_offset;
> > -      } else if (!is_passthrough_shader) {
> > -         if (imm_offset == 0) {
> > -            value.type = BRW_REGISTER_TYPE_F;
> > -
> > -            mask &= (1 << tesslevel_inner_components(tcs_key->tes_primitive_mode))
> > - 1;
> > -
> > -            /* This is a write to gl_TessLevelInner[], which lives in the
> > -             * Patch URB header.  The layout depends on the domain.
> > -             */
> > -            switch (tcs_key->tes_primitive_mode) {
> > -            case GL_QUADS:
> > -               /* gl_TessLevelInner[].xy lives at DWords 3-2 (reversed).
> > -                * We use an XXYX swizzle to reverse put .xy in the .wz
> > -                * channels, and use a .zw writemask.
> > -                */
> > -               mask = writemask_for_backwards_vector(mask);
> > -               swiz = BRW_SWIZZLE4(0, 0, 1, 0);
> > -               break;
> > -            case GL_TRIANGLES:
> > -               /* gl_TessLevelInner[].x lives at DWord 4, so we set the
> > -                * writemask to X and bump the URB offset by 1.
> > -                */
> > -               imm_offset = 1;
> > -               break;
> > -            case GL_ISOLINES:
> > -               /* Skip; gl_TessLevelInner[] doesn't exist for isolines. */
> > -               return;
> > -            default:
> > -               unreachable("Bogus tessellation domain");
> > -            }
> > -         } else if (imm_offset == 1) {
> > -            /* This is a write to gl_TessLevelOuter[] which lives in the
> > -             * Patch URB Header at DWords 4-7.  However, it's reversed, so
> > -             * instead of .xyzw we have .wzyx.
> > -             */
> > -            value.type = BRW_REGISTER_TYPE_F;
> > -
> > -            mask &= (1 << tesslevel_outer_components(tcs_key->tes_primitive_mode))
> > - 1;
> > -
> > -            if (tcs_key->tes_primitive_mode == GL_ISOLINES) {
> > -               /* Isolines .xy should be stored in .zw, in order. */
> > -               swiz = BRW_SWIZZLE4(0, 0, 0, 1);
> > -               mask <<= 2;
> > -            } else {
> > -               /* Other domains are reversed; store .wzyx instead of
> > .xyzw */
> > -               swiz = BRW_SWIZZLE_WZYX;
> > -               mask = writemask_for_backwards_vector(mask);
> > -            }
> > -         }
> >        }
> >
> >        if (mask == 0)
> > @@ -2851,48 +2731,6 @@ fs_visitor::nir_emit_tes_intrinsic(const
> > fs_builder &bld,
> >        }
> >        break;
> >
> > -   case nir_intrinsic_load_tess_level_outer:
> > -      /* When the TES reads gl_TessLevelOuter, we ensure that the patch
> > header
> > -       * appears as a push-model input.  So, we can simply use the ATTR
> > file
> > -       * rather than issuing URB read messages.  The data is stored in the
> > -       * high DWords in reverse order - DWord 7 contains .x, DWord 6
> > contains
> > -       * .y, and so on.
> > -       */
> > -      switch (tes_prog_data->domain) {
> > -      case BRW_TESS_DOMAIN_QUAD:
> > -         for (unsigned i = 0; i < 4; i++)
> > -            bld.MOV(offset(dest, bld, i), component(fs_reg(ATTR, 0), 7 -
> > i));
> > -         break;
> > -      case BRW_TESS_DOMAIN_TRI:
> > -         for (unsigned i = 0; i < 3; i++)
> > -            bld.MOV(offset(dest, bld, i), component(fs_reg(ATTR, 0), 7 -
> > i));
> > -         break;
> > -      case BRW_TESS_DOMAIN_ISOLINE:
> > -         for (unsigned i = 0; i < 2; i++)
> > -            bld.MOV(offset(dest, bld, i), component(fs_reg(ATTR, 0), 6 +
> > i));
> > -         break;
> > -      }
> > -      break;
> > -
> > -   case nir_intrinsic_load_tess_level_inner:
> > -      /* When the TES reads gl_TessLevelInner, we ensure that the patch
> > header
> > -       * appears as a push-model input.  So, we can simply use the ATTR
> > file
> > -       * rather than issuing URB read messages.
> > -       */
> > -      switch (tes_prog_data->domain) {
> > -      case BRW_TESS_DOMAIN_QUAD:
> > -         bld.MOV(dest, component(fs_reg(ATTR, 0), 3));
> > -         bld.MOV(offset(dest, bld, 1), component(fs_reg(ATTR, 0), 2));
> > -         break;
> > -      case BRW_TESS_DOMAIN_TRI:
> > -         bld.MOV(dest, component(fs_reg(ATTR, 0), 4));
> > -         break;
> > -      case BRW_TESS_DOMAIN_ISOLINE:
> > -         /* ignore - value is undefined */
> > -         break;
> > -      }
> > -      break;
> > -
> >     case nir_intrinsic_load_input:
> >     case nir_intrinsic_load_per_vertex_input: {
> >        fs_reg indirect_offset = get_indirect_offset(instr);
> > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
> > b/src/mesa/drivers/dri/i965/brw_nir.c
> > index 6f37e97a86f..46eeb1723b4 100644
> > --- a/src/mesa/drivers/dri/i965/brw_nir.c
> > +++ b/src/mesa/drivers/dri/i965/brw_nir.c
> > @@ -141,9 +141,68 @@ remap_inputs_with_vue_map(nir_block *block, const
> > struct brw_vue_map *vue_map)
> >  }
> >
> >  static bool
> > +remap_tess_levels(nir_builder *b, nir_intrinsic_instr *intr,
> > +                  GLenum primitive_mode)
> > +{
> > +   const int location = nir_intrinsic_base(intr);
> > +   const unsigned component = nir_intrinsic_component(intr);
> > +   bool out_of_bounds;
> > +
> > +   if (location == VARYING_SLOT_TESS_LEVEL_INNER) {
> > +      switch (primitive_mode) {
> > +      case GL_QUADS:
> > +         /* gl_TessLevelInner[0..1] lives at DWords 3-2 (reversed). */
> > +         nir_intrinsic_set_base(intr, 0);
> > +         nir_intrinsic_set_component(intr, 3 - component);
> > +         out_of_bounds = false;
> >
> 
> What if component > 1?  I guess that's not really a problem but it is
> out-of-bounds...

The GLSL array is a float[2], so I guess I was assuming something would
reject the out of bounds array access higher up in the compiler.

Maybe that isn't a valid assumption though...

> > +         break;
> > +      case GL_TRIANGLES:
> > +         /* gl_TessLevelInner[0] lives at DWord 4. */
> > +         nir_intrinsic_set_base(intr, 1);
> > +         out_of_bounds = component > 0;
> > +         break;
> > +      case GL_ISOLINES:
> > +         out_of_bounds = true;
> > +         break;
> > +      default:
> > +         unreachable("Bogus tessellation domain");
> > +      }
> > +   } else if (location == VARYING_SLOT_TESS_LEVEL_OUTER) {
> > +      if (primitive_mode == GL_ISOLINES) {
> > +         /* gl_TessLevelOuter[0..1] lives at DWords 6-7 (in order). */
> > +         nir_intrinsic_set_base(intr, 1);
> > +         nir_intrinsic_set_component(intr, 2 +
> > nir_intrinsic_component(intr));
> > +         out_of_bounds = component > 1;
> > +      } else {
> > +         /* Triangles use DWords 7-5 (reversed); Quads use 7-4 (reversed)
> > */
> > +         nir_intrinsic_set_base(intr, 1);
> > +         nir_intrinsic_set_component(intr, 3 -
> > nir_intrinsic_component(intr));
> > +         out_of_bounds = component == 3 && primitive_mode == GL_TRIANGLES;
> >
> +      }
> > +   } else {
> > +      return false;
> > +   }
> > +
> > +   if (out_of_bounds) {
> > +      if (nir_intrinsic_infos[intr->intrinsic].has_dest) {
> > +         b->cursor = nir_before_instr(&intr->instr);
> > +         nir_ssa_def *undef = nir_ssa_undef(b, 1, 32);
> > +         nir_ssa_def_rewrite_uses(&intr->dest.ssa,
> > nir_src_for_ssa(undef));
> > +      }
> > +      nir_instr_remove(&intr->instr);
> > +   }
> > +
> > +   return true;
> > +}
> > +
> > +static bool
> >  remap_patch_urb_offsets(nir_block *block, nir_builder *b,
> > -                        const struct brw_vue_map *vue_map)
> > +                        const struct brw_vue_map *vue_map,
> > +                        GLenum tes_primitive_mode)
> >  {
> > +   const bool is_passthrough_tcs = b->shader->info->name &&
> > +      strcmp(b->shader->info->name, "passthrough") == 0;
> >
> 
> This is gross...
> 
> Also... Why?  What's so special about the passthrough that it doesn't need
> tess level remaps?  I have a feeling there's some more general thing we
> could be doing here.

We've always been special casing it, so I'm not sure it's more gross
than it was before...*shrug*.

All TCS must write the patch header, which is two vec4 slots:

   - gl_TessLevelOuter[4]
   - gl_TessLevelInner[2]
   - TR DS Cache Disable bit (DWord 0)

For the passthrough shader, the inner/outer levels come from API state.
So, I hooked up two built-in vec4 uniforms, and made their contents
match the patch URB header exactly.  So, the shader doesn't have to
do any remapping or shuffling...it just straight copies both vec4s.
Simpler and more efficient.  But...a special case.

> > +
> >     nir_foreach_instr_safe(instr, block) {
> >        if (instr->type != nir_instr_type_intrinsic)
> >           continue;
> > @@ -154,6 +213,11 @@ remap_patch_urb_offsets(nir_block *block,
> > nir_builder *b,
> >
> >        if ((stage == MESA_SHADER_TESS_CTRL && is_output(intrin)) ||
> >            (stage == MESA_SHADER_TESS_EVAL && is_input(intrin))) {
> > +
> > +         if (!is_passthrough_tcs &&
> > +             remap_tess_levels(b, intrin, tes_primitive_mode))
> > +            continue;
> >
> 
> Let's make sure I've got this right...
> 
> We map everything from the varying identifiers to VUE slots.  For the case
> of tesslevel, they will be assigned VUE slots 0 and 1 so the code below
> will hever cause any other output to alias.  Then we rework stuff in
> remap_tess_levels so that they map to the right locaion in 8-dword chunk at
> the begining of the VUE.

Yep, exactly.  I assigned VARYING_SLOT_TESS_LEVEL_INNER/OUTER to slots 0
and 1, since both together live in those slots.  (Which is which doesn't
actually matter.)  Here we special case those and take care of mapping
the individual components to the right spot.

> This function (remap_patch_urb_offsets) could really use a comment at the
> top saying what it's doing.

I suppose so.  It makes sense to me, but I wrote it...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20170104/ab533aa8/attachment.sig>


More information about the mesa-dev mailing list