[Mesa-dev] [PATCH] i965: Separate base offset/constant offset combining from remapping.
Jason Ekstrand
jason at jlekstrand.net
Wed Dec 9 15:13:55 PST 2015
On Wed, Dec 9, 2015 at 12:08 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Wednesday, December 09, 2015 08:03:25 AM Jason Ekstrand wrote:
>> On Dec 9, 2015 2:51 AM, "Kenneth Graunke" <kenneth at whitecape.org> wrote:
>> >
>> > My tessellation branch has two additional remap functions. I don't want
>> > to replicate this logic there.
>> >
>> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
>> > ---
>> > src/mesa/drivers/dri/i965/brw_nir.c | 78
>> ++++++++++++++++++++++++-------------
>> > 1 file changed, 50 insertions(+), 28 deletions(-)
>> >
>> > Hey Jason,
>> >
>> > If you like this patch, and haven't yet merged your NIR input reworks,
>> > feel free to just squash it into your changes. Or, we can land it
>> > separately after your changes. It's up to you.
>> >
>> > Separating this out allows me to reuse this in my new tessellation input
>> > and output remapping functions, and also means we don't need to add
>> structs
>> > for the remap functions...we can just pass the builder, or inputs_read, or
>> > the VUE map...and not have to pack multiple things together.
>>
>> Sure. It does make things simpler.
>>
>> > --Ken
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
>> b/src/mesa/drivers/dri/i965/brw_nir.c
>> > index 14ad172..105a175 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_nir.c
>> > +++ b/src/mesa/drivers/dri/i965/brw_nir.c
>> > @@ -27,15 +27,19 @@
>> > #include "glsl/nir/nir_builder.h"
>> > #include "program/prog_to_nir.h"
>> >
>> > -struct remap_vs_attrs_state {
>> > - nir_builder b;
>> > - uint64_t inputs_read;
>> > -};
>> > -
>> > +/**
>> > + * In many cases, we just add the base and offset together, so there's no
>> > + * reason to keep them separate. Sometimes, combining them is essential:
>> > + * if a shader only accesses part of a compound variable (such as a
>> matrix
>> > + * or array), the variable's base may not actually exist in the VUE map.
>> > + *
>> > + * This pass adds constant offsets to instr->const_index[0], and resets
>> > + * the offset source to 0. Non-constant offsets remain unchanged.
>> > + */
>> > static bool
>> > -remap_vs_attrs(nir_block *block, void *void_state)
>> > +add_const_offset_to_base(nir_block *block, void *closure)
>> > {
>> > - struct remap_vs_attrs_state *state = void_state;
>> > + nir_builder *b = closure;
>> >
>> > nir_foreach_instr_safe(block, instr) {
>> > if (instr->type != nir_instr_type_intrinsic)
>> > @@ -43,30 +47,48 @@ remap_vs_attrs(nir_block *block, void *void_state)
>> >
>> > nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
>> >
>> > + if (intrin->intrinsic == nir_intrinsic_load_input ||
>> > + intrin->intrinsic == nir_intrinsic_load_per_vertex_input ||
>> > + intrin->intrinsic == nir_intrinsic_load_output ||
>> > + intrin->intrinsic == nir_intrinsic_load_per_vertex_output ||
>> > + intrin->intrinsic == nir_intrinsic_store_output ||
>> > + intrin->intrinsic == nir_intrinsic_store_per_vertex_output) {
>>
>> This seems a bit scortched-earth. It would be nice if the caller had a bit
>> more control.
>
> We could always add "do input"/"do output" options once we actually need
> them. It means adding the structs back, but that's fine.
>
> I suppose for now we could ignore output intrinsics here.
>
>> > + nir_src *offset = nir_get_io_offset_src(intrin);
>> > + nir_const_value *const_offset = nir_src_as_const_value(*offset);
>> > +
>> > + if (const_offset) {
>> > + intrin->const_index[0] += const_offset->u[0];
>> > + b->cursor = nir_before_instr(&intrin->instr);
>> > + nir_instr_rewrite_src(&intrin->instr, offset,
>> > + nir_src_for_ssa(nir_imm_int(b, 0)));
>> > + }
>>
>> Else??? It seems that you don't want to run this pass if you think you'll
>> ever hit an indirect. I guess it's harmless to just do this for all direct
>> things in our driver, but it doesn't sit well.
>
> Else do nothing. The problem I'm trying to avoid with this logic is
> that inputs/outputs which take up multiple slots and are directly
> accessed may only have slots assigned for the sub-components that are
> actually used. So, I can't use the base offset, and need to move the
> base to the slot that's actually used.
>
> If something is accessed indirectly, we assign all of its slots, so
> using the base location is fine.
Ok, that makes sense.
>> > + }
>> > + }
>> > + return true;
>> > +
>> > +}
>> > +
>> > +static bool
>> > +remap_vs_attrs(nir_block *block, void *closure)
>> > +{
>> > + GLbitfield64 inputs_read = *((GLbitfield64 *) closure);
>> > +
>> > + nir_foreach_instr(block, instr) {
>> > + if (instr->type != nir_instr_type_intrinsic)
>> > + continue;
>> > +
>> > + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
>> > +
>> > if (intrin->intrinsic == nir_intrinsic_load_input) {
>> > /* Attributes come in a contiguous block, ordered by their
>> > * gl_vert_attrib value. That means we can compute the slot
>> > * number for an attribute by masking out the enabled attributes
>> > * before it and counting the bits.
>> > */
>> > - nir_const_value *const_offset =
>> nir_src_as_const_value(intrin->src[0]);
>> > -
>> > - /* We set EmitNoIndirect for VS inputs, so there are no
>> indirects. */
>> > - assert(const_offset);
>> > -
>> > - int attr = intrin->const_index[0] + const_offset->u[0];
>> > - int slot = _mesa_bitcount_64(state->inputs_read &
>> > - BITFIELD64_MASK(attr));
>> > + int attr = intrin->const_index[0];
>> > + int slot = _mesa_bitcount_64(inputs_read &
>> BITFIELD64_MASK(attr));
>> >
>> > - /* The NIR -> FS pass will just add the base and offset
>> together, so
>> > - * there's no reason to keep them separate. Just put it all in
>> > - * const_index[0] and set the offset src[0] to load_const(0).
>> > - */
>> > intrin->const_index[0] = 4 * slot;
>> > -
>> > - state->b.cursor = nir_before_instr(&intrin->instr);
>> > - nir_instr_rewrite_src(&intrin->instr, &intrin->src[0],
>> > - nir_src_for_ssa(nir_imm_int(&state->b,
>> 0)));
>> > }
>> > }
>> > return true;
>> > @@ -97,17 +119,17 @@ brw_nir_lower_inputs(nir_shader *nir,
>> > * key->inputs_read since the two are identical aside from
>> Gen4-5
>> > * edge flag differences.
>> > */
>> > - struct remap_vs_attrs_state remap_state = {
>> > - .inputs_read = nir->info.inputs_read,
>> > - };
>> > + GLbitfield64 inputs_read = nir->info.inputs_read;
>> >
>> > /* This pass needs actual constants */
>> > nir_opt_constant_folding(nir);
>> >
>> > nir_foreach_overload(nir, overload) {
>> > if (overload->impl) {
>> > - nir_builder_init(&remap_state.b, overload->impl);
>> > - nir_foreach_block(overload->impl, remap_vs_attrs,
>> &remap_state);
>> > + nir_builder b;
>> > + nir_builder_init(&b, overload->impl);
>> > + nir_foreach_block(overload->impl,
>> add_const_offset_to_base, &b);
>>
>> What really bothers me is that this call is in remap_vs_attrs but it stomps
>> on store_output intrinsics. That seems bad.
>
> It's not really part of remap_vs_attrs, but you're right, it is part of
> brw_nir_lower_inputs(). So affecting outputs seems rather dirty.
>
> How about we just drop outputs for now, and I can fix it in my series?
That would probably work. I'll make an input-only version for VS,
squash it in, and then you can make the full version do whatever you
want.
>> Overall, I think this is a fine approach but I wish it were more
>> fine-grained. Unfortunately, I don't have any hot ideas on how to make it
>> better off-hand.
>>
>> --Jason
>>
>> > + nir_foreach_block(overload->impl, remap_vs_attrs,
>> &inputs_read);
>> > }
>> > }
>> > }
>> > --
>> > 2.6.3
>> >
More information about the mesa-dev
mailing list