[Mesa-dev] [PATCH] i965: Separate base offset/constant offset combining from remapping.

Wed Dec 9 15:13:55 PST 2015

On Wed, Dec 9, 2015 at 12:08 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Wednesday, December 09, 2015 08:03:25 AM Jason Ekstrand wrote:
>> On Dec 9, 2015 2:51 AM, "Kenneth Graunke" <kenneth at whitecape.org> wrote:
>> >
>> > My tessellation branch has two additional remap functions.  I don't want
>> > to replicate this logic there.
>> >
>> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_nir.c | 78
>> ++++++++++++++++++++++++-------------
>> >  1 file changed, 50 insertions(+), 28 deletions(-)
>> >
>> > Hey Jason,
>> >
>> > If you like this patch, and haven't yet merged your NIR input reworks,
>> > feel free to just squash it into your changes.  Or, we can land it
>> > separately after your changes.  It's up to you.
>> >
>> > Separating this out allows me to reuse this in my new tessellation input
>> > and output remapping functions, and also means we don't need to add
>> structs
>> > for the remap functions...we can just pass the builder, or inputs_read, or
>> > the VUE map...and not have to pack multiple things together.
>>
>> Sure.  It does make things simpler.
>>
>> >  --Ken
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
>> b/src/mesa/drivers/dri/i965/brw_nir.c
>> > index 14ad172..105a175 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_nir.c
>> > +++ b/src/mesa/drivers/dri/i965/brw_nir.c
>> > @@ -27,15 +27,19 @@
>> >  #include "glsl/nir/nir_builder.h"
>> >  #include "program/prog_to_nir.h"
>> >
>> > -struct remap_vs_attrs_state {
>> > -   nir_builder b;
>> > -   uint64_t inputs_read;
>> > -};
>> > -
>> > +/**
>> > + * In many cases, we just add the base and offset together, so there's no
>> > + * reason to keep them separate.  Sometimes, combining them is essential:
>> > + * if a shader only accesses part of a compound variable (such as a
>> matrix
>> > + * or array), the variable's base may not actually exist in the VUE map.
>> > + *
>> > + * This pass adds constant offsets to instr->const_index[0], and resets
>> > + * the offset source to 0.  Non-constant offsets remain unchanged.
>> > + */
>> >  static bool
>> > -remap_vs_attrs(nir_block *block, void *void_state)
>> > +add_const_offset_to_base(nir_block *block, void *closure)
>> >  {
>> > -   struct remap_vs_attrs_state *state = void_state;
>> > +   nir_builder *b = closure;
>> >
>> >     nir_foreach_instr_safe(block, instr) {
>> >        if (instr->type != nir_instr_type_intrinsic)
>> > @@ -43,30 +47,48 @@ remap_vs_attrs(nir_block *block, void *void_state)
>> >
>> >        nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
>> >
>> > +      if (intrin->intrinsic == nir_intrinsic_load_input ||
>> > +          intrin->intrinsic == nir_intrinsic_load_per_vertex_input ||
>> > +          intrin->intrinsic == nir_intrinsic_load_output ||
>> > +          intrin->intrinsic == nir_intrinsic_load_per_vertex_output ||
>> > +          intrin->intrinsic == nir_intrinsic_store_output ||
>> > +          intrin->intrinsic == nir_intrinsic_store_per_vertex_output) {
>>
>> This seems a bit scortched-earth. It would be nice if the caller had a bit
>> more control.
>
> We could always add "do input"/"do output" options once we actually need
> them.  It means adding the structs back, but that's fine.
>
> I suppose for now we could ignore output intrinsics here.
>
>> > +         nir_src *offset = nir_get_io_offset_src(intrin);
>> > +         nir_const_value *const_offset = nir_src_as_const_value(*offset);
>> > +
>> > +         if (const_offset) {
>> > +            intrin->const_index[0] += const_offset->u[0];
>> > +            b->cursor = nir_before_instr(&intrin->instr);
>> > +            nir_instr_rewrite_src(&intrin->instr, offset,
>> > +                                  nir_src_for_ssa(nir_imm_int(b, 0)));
>> > +         }
>>
>> Else???  It seems that you don't want to run this pass if you think you'll
>> ever hit an indirect.  I guess it's harmless to just do this for all direct
>> things in our driver, but it doesn't sit well.
>
> Else do nothing.   The problem I'm trying to avoid with this logic is
> that inputs/outputs which take up multiple slots and are directly
> accessed may only have slots assigned for the sub-components that are
> actually used.  So, I can't use the base offset, and need to move the
> base to the slot that's actually used.
>
> If something is accessed indirectly, we assign all of its slots, so
> using the base location is fine.

Ok, that makes sense.

>> > +      }
>> > +   }
>> > +   return true;
>> > +
>> > +}
>> > +
>> > +static bool
>> > +remap_vs_attrs(nir_block *block, void *closure)
>> > +{
>> > +   GLbitfield64 inputs_read = *((GLbitfield64 *) closure);
>> > +
>> > +   nir_foreach_instr(block, instr) {
>> > +      if (instr->type != nir_instr_type_intrinsic)
>> > +         continue;
>> > +
>> > +      nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
>> > +
>> >        if (intrin->intrinsic == nir_intrinsic_load_input) {
>> >           /* Attributes come in a contiguous block, ordered by their
>> >            * gl_vert_attrib value.  That means we can compute the slot
>> >            * number for an attribute by masking out the enabled attributes
>> >            * before it and counting the bits.
>> >            */
>> > -         nir_const_value *const_offset =
>> nir_src_as_const_value(intrin->src[0]);
>> > -
>> > -         /* We set EmitNoIndirect for VS inputs, so there are no
>> indirects. */
>> > -         assert(const_offset);
>> > -
>> > -         int attr = intrin->const_index[0] + const_offset->u[0];
>> > -         int slot = _mesa_bitcount_64(state->inputs_read &
>> > -                                      BITFIELD64_MASK(attr));
>> > +         int attr = intrin->const_index[0];
>> > +         int slot = _mesa_bitcount_64(inputs_read &
>> BITFIELD64_MASK(attr));
>> >
>> > -         /* The NIR -> FS pass will just add the base and offset
>> together, so
>> > -          * there's no reason to keep them separate.  Just put it all in
>> > -          * const_index[0] and set the offset src[0] to load_const(0).
>> > -          */
>> >           intrin->const_index[0] = 4 * slot;
>> > -
>> > -         state->b.cursor = nir_before_instr(&intrin->instr);
>> > -         nir_instr_rewrite_src(&intrin->instr, &intrin->src[0],
>> > -                               nir_src_for_ssa(nir_imm_int(&state->b,
>> 0)));
>> >        }
>> >     }
>> >     return true;
>> > @@ -97,17 +119,17 @@ brw_nir_lower_inputs(nir_shader *nir,
>> >            * key->inputs_read since the two are identical aside from
>> Gen4-5
>> >            * edge flag differences.
>> >            */
>> > -         struct remap_vs_attrs_state remap_state = {
>> > -            .inputs_read = nir->info.inputs_read,
>> > -         };
>> > +         GLbitfield64 inputs_read = nir->info.inputs_read;
>> >
>> >           /* This pass needs actual constants */
>> >           nir_opt_constant_folding(nir);
>> >
>> >           nir_foreach_overload(nir, overload) {
>> >              if (overload->impl) {
>> > -               nir_builder_init(&remap_state.b, overload->impl);
>> > -               nir_foreach_block(overload->impl, remap_vs_attrs,
>> &remap_state);
>> > +               nir_builder b;
>> > +               nir_builder_init(&b, overload->impl);
>> > +               nir_foreach_block(overload->impl,
>> add_const_offset_to_base, &b);
>>
>> What really bothers me is that this call is in remap_vs_attrs but it stomps
>> on store_output intrinsics.  That seems bad.
>
> It's not really part of remap_vs_attrs, but you're right, it is part of
> brw_nir_lower_inputs().  So affecting outputs seems rather dirty.
>
> How about we just drop outputs for now, and I can fix it in my series?

That would probably work.  I'll make an input-only version for VS,
squash it in, and then you can make the full version do whatever you
want.

>> Overall, I think this is a fine approach but I wish it were more
>> fine-grained.  Unfortunately, I don't have any hot ideas on how to make it
>> better off-hand.
>>
>> --Jason
>>
>> > +               nir_foreach_block(overload->impl, remap_vs_attrs,
>> &inputs_read);
>> >              }
>> >           }
>> >        }
>> > --
>> > 2.6.3
>> >