[Mesa-dev] [PATCH 8/8] i965/vs: Simplify fs_visitor's ATTR file.

Kenneth Graunke kenneth at whitecape.org
Tue Aug 18 01:28:50 PDT 2015


On Monday, August 17, 2015 09:53:20 PM Jason Ekstrand wrote:
> On Aug 17, 2015 4:08 PM, "Kenneth Graunke" <kenneth at whitecape.org> wrote:
> >
> > Previously, ATTR was indexed by VERT_ATTRIB_* slots; at the end of
> > compilation, assign_vs_urb_setup() translated those into GRF units,
> > and converted ATTR to HW_REGs.
> >
> > This patch moves the transslation earlier, making ATTR work in terms of
> > GRF units from the beginning.  assign_vs_urb_setup() simply has to add
> > the number of payload registers and push constants to obtain the final
> > hardware GRF number.  (We can't do this earlier as those values aren't
> > known.)
> >
> > ATTR still supports reg_offset; however, it's simply added to reg.
> > It's not clear whether this is valuable or not.
> >
> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp         | 26 +++------------
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp |  2 +-
> >  src/mesa/drivers/dri/i965/brw_nir.c          | 48 +++++++++++++++++++++++++---
> >  3 files changed, 50 insertions(+), 26 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index efabb52..f556ed6 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -1511,10 +1511,9 @@ void
> >  fs_visitor::assign_vs_urb_setup()
> >  {
> >     brw_vs_prog_data *vs_prog_data = (brw_vs_prog_data *) prog_data;
> > -   int grf, count, slot, channel, attr;
> >
> >     assert(stage == MESA_SHADER_VERTEX);
> > -   count = _mesa_bitcount_64(vs_prog_data->inputs_read);
> > +   int count = _mesa_bitcount_64(vs_prog_data->inputs_read);
> >     if (vs_prog_data->uses_vertexid || vs_prog_data->uses_instanceid)
> >        count++;
> >
> > @@ -1534,25 +1533,10 @@ fs_visitor::assign_vs_urb_setup()
> >     foreach_block_and_inst(block, fs_inst, inst, cfg) {
> >        for (int i = 0; i < inst->sources; i++) {
> >           if (inst->src[i].file == ATTR) {
> > -
> > -            if (inst->src[i].reg == VERT_ATTRIB_MAX) {
> > -               slot = count - 1;
> > -            } else {
> > -               /* Attributes come in in a contiguous block, ordered by their
> > -                * gl_vert_attrib value.  That means we can compute the slot
> > -                * number for an attribute by masking out the enabled
> > -                * attributes before it and counting the bits.
> > -                */
> > -               attr = inst->src[i].reg + inst->src[i].reg_offset / 4;
> > -               slot = _mesa_bitcount_64(vs_prog_data->inputs_read &
> > -                                        BITFIELD64_MASK(attr));
> > -            }
> > -
> > -            channel = inst->src[i].reg_offset & 3;
> > -
> > -            grf = payload.num_regs +
> > -               prog_data->curb_read_length +
> > -               slot * 4 + channel;
> > +            int grf = payload.num_regs +
> > +                      prog_data->curb_read_length +
> > +                      inst->src[i].reg +
> > +                      inst->src[i].reg_offset;
> >
> >              inst->src[i].file = HW_REG;
> >              inst->src[i].fixed_hw_reg =
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > index 111db8c..0ec8ef5 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > @@ -53,7 +53,7 @@ fs_reg *
> >  fs_visitor::emit_vs_system_value(int location)
> >  {
> >     fs_reg *reg = new(this->mem_ctx)
> > -      fs_reg(ATTR, VERT_ATTRIB_MAX, BRW_REGISTER_TYPE_D);
> > +      fs_reg(ATTR, 4*_mesa_bitcount_64(prog->InputsRead), BRW_REGISTER_TYPE_D);
> >     brw_vs_prog_data *vs_prog_data = (brw_vs_prog_data *) prog_data;
> >
> >     switch (location) {
> > diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c
> > index 2e2d02b..e0cf61e 100644
> > --- a/src/mesa/drivers/dri/i965/brw_nir.c
> > +++ b/src/mesa/drivers/dri/i965/brw_nir.c
> > @@ -27,8 +27,36 @@
> >  #include "glsl/nir/glsl_to_nir.h"
> >  #include "program/prog_to_nir.h"
> >
> > +static bool
> > +remap_vs_attrs(nir_block *block, void *closure)
> > +{
> > +   GLbitfield64 inputs_read = *((GLbitfield64 *) closure);
> > +
> > +   nir_foreach_instr(block, instr) {
> > +      if (instr->type != nir_instr_type_intrinsic)
> > +         continue;
> > +
> > +      nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
> > +
> > +      /* We set EmitNoIndirect for VS inputs, so there are no indirects. */
> > +      assert(intrin->intrinsic != nir_intrinsic_load_input_indirect);
> > +
> > +      if (intrin->intrinsic == nir_intrinsic_load_input) {
> > +         /* Attributes come in a contiguous block, ordered by their
> > +          * gl_vert_attrib value.  That means we can compute the slot
> > +          * number for an attribute by masking out the enabled attributes
> > +          * before it and counting the bits.
> > +          */
> > +         int attr = intrin->const_index[0];
> > +         int slot = _mesa_bitcount_64(inputs_read & BITFIELD64_MASK(attr));
> 
> Um... Can't we just set var->data.driver_location to this value? What's the
> point of it being its own pass?

Nope.  I'm not sure how to do it in one pass.

1. We initialize var->driver_location to the VERT_ATTRIB_* value for the
   base of the variable.

2. We use nir_lower_io() to follow the dereference chains and compute
   the location of the data we're actually accessing.

   OpenGL specifies that array slots and matrix columns each get their
   own generic vertex attribute slot (each of which is a vec4).  So,
   we use type_size_vec4() to make get_io_offset count in vec4 units.

3. However...vertex attributes are packed - unused slots don't actually
   take up space, so everything gets compacted down to consecutive
   slots.  This is the bitcount trick.

So, for example, if we have a shader that declares:

   layout(location = 1) in mat4 m;

then var->data.location will be VERT_ATTRIB_GENERIC1.  A dereference
of m[3] would compute a slot value of VERT_ATTRIB_GENERIC1 + 3
(VERT_ATTRIB_GENERIC4).

However, if the shader doesn't access m[0], m[1], or m[2]...then those
columns won't actually be uploaded.  We can't tell this by looking at
a single dereference, or even a variable declaration.  It depends on
what the whole shader does.

So we look at prog->InputsRead.  For a particular VERT_ATTRIB_* slot,
we mask off the bits before that one, and bitcount them.  This tells
us how many attributes are actually active before then, which is the
actual VUE offset/register count.

I'm not a huge fan of the two-pass approach, but the only alternative
would be to make get_io_offset() look at InputsRead somehow and figure
out whether array elements actually matter when adding up offsets.
Which means something a lot fancier than using type_size()...

Alternatively, we could have nir_lower_io take a callback that remaps
the driver location after computing it...which still gives us the two
stage computation...but avoids the second walk over the IR.  But I don't
know if that's generally useful...

Does that make more sense?

> > +         intrin->const_index[0] = 4 * slot;
> > +      }
> > +   }
> > +   return true;
> > +}
> > +
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150818/16e49a44/attachment-0001.sig>


More information about the mesa-dev mailing list