[Mesa-dev] [PATCH 11/15] i965/fs: When >64 input components, order them to match prev pipeline stage.

Paul Berry stereotype441 at gmail.com
Mon Sep 16 10:40:03 PDT 2013


On 15 September 2013 21:16, Kenneth Graunke <kenneth at whitecape.org> wrote:

> On 09/03/2013 04:18 PM, Paul Berry wrote:
> > Since the SF/SBE stage is only capable of performing arbitrary
> > reorderings of 16 varying slots, we can't arrange the fragment shader
> > inputs in an arbitrary order if there are more than 16 input varying
> > slots in use.  We need to make sure that slots 16-31 match the
> > corresponding outputs of the previous pipeline stage.
> >
> > The easiest way to accomplish this is to just make all varying slots
> > match up with the previous pipeline stage.
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp | 42
> ++++++++++++++++++++++++++++++------
> >  src/mesa/drivers/dri/i965/brw_wm.c   |  3 ++-
> >  2 files changed, 38 insertions(+), 7 deletions(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index 7950d5f6..8d73a0f 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -1237,11 +1237,40 @@ fs_visitor::calculate_urb_setup()
> >     int urb_next = 0;
> >     /* Figure out where each of the incoming setup attributes lands. */
> >     if (brw->gen >= 6) {
> > -      for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {
> > -      if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
> > -             BITFIELD64_BIT(i)) {
> > -         c->prog_data.urb_setup[i] = urb_next++;
> > -      }
> > +      if (_mesa_bitcount_64(fp->Base.InputsRead &
> > +                            BRW_FS_VARYING_INPUT_MASK) <= 16) {
> > +         /* The SF/SBE pipeline stage can do arbitrary rearrangement of
> the
> > +          * first 16 varying inputs, so we can put them wherever we
> want.
> > +          * Just put them in order.
> > +          */
>
> It might be nice to have a comment saying why this is useful (as opposed
> to always following the previous stage's ordering).  As I understand it,
> this is useful when the VS outputs a bunch of varyings but the FS only
> uses a couple of them---we can pack them into the first few slots,
> saving space.
>

Also, it allows us to avoid recompiling the fragment shader when the vertex
(or geometry) shader changes.  I've added this comment to clarify:

          * This is useful because it means that (a) inputs not used by the
          * fragment shader won't take up valuable register space, and (b)
we
          * won't have to recompile the fragment shader if it gets paired
with
          * a different vertex (or geometry) shader.



>
> > +         for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {
> > +            if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
> > +                BITFIELD64_BIT(i)) {
> > +               c->prog_data.urb_setup[i] = urb_next++;
> > +            }
> > +         }
> > +      } else {
> > +         /* We have enough input varyings that the SF/SBE pipeline
> stage can't
> > +          * arbitrarily rearrange them to suit our whim; we have to put
> them
> > +          * in an order that matches the output of the previous
> pipeline stage
> > +          * (geometry or vertex shader).
> > +          */
> > +         struct brw_vue_map prev_stage_vue_map;
> > +         brw_compute_vue_map(brw, &prev_stage_vue_map,
> > +                             c->key.input_slots_valid);
> > +         int first_slot = 2 * BRW_SF_URB_ENTRY_READ_OFFSET;
> > +         assert(prev_stage_vue_map.num_slots <= first_slot + 32);
> > +         for (int slot = first_slot; slot <
> prev_stage_vue_map.num_slots;
> > +              slot++) {
> > +            int varying = prev_stage_vue_map.slot_to_varying[slot];
> > +            if (varying != BRW_VARYING_SLOT_COUNT &&
>
> It wasn't immediately obvious to me why varying would be
> BRW_VARYING_SLOT_COUNT.  But, on further inspection, I see this is the
> value that gets assigned to empty slots.
>
> Up to you whether you want to add a small comment.
>

Good point.  I've added this comment:

            /* Note that varying == BRW_VARYING_SLOT_COUNT when a slot is
             * unused.
             */



>
> > +                (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
> > +                 BITFIELD64_BIT(varying))) {
> > +               c->prog_data.urb_setup[varying] = slot - first_slot;
> > +               urb_next = MAX2(urb_next, slot + 1);
> > +            }
> > +         }
> > +         urb_next = prev_stage_vue_map.num_slots - first_slot;
>
> Huh?  It looks like you're setting urb_next in the loop, then clobbering
> it immediately after the loop.  This should probably be fixed.
>

Oops, I meant to delete the first reference to urb_next entirely, and I
guess I forgot.  Fixed, thanks.


>
> >        }
> >     } else {
> >        /* FINISHME: The sf doesn't map VS->FS inputs for us very well. */
> > @@ -3149,7 +3178,8 @@ brw_fs_precompile(struct gl_context *ctx, struct
> gl_shader_program *prog)
> >        key.iz_lookup |= IZ_DEPTH_WRITE_ENABLE_BIT;
> >     }
> >
> > -   if (brw->gen < 6)
> > +   if (brw->gen < 6 || _mesa_bitcount_64(fp->Base.InputsRead &
> > +                                         BRW_FS_VARYING_INPUT_MASK) >
> 16)
>
> Could this be simplified to:
>
>    if (brw->gen < 6 || c->prog_data.num_varying_inputs > 16)
>
> Or are these values different?
>

Unfortunately not.  This logic is happening during setup of the program
key, which happens before compilation, so c->prog_data hasn't been
calculated yet.  The same applies to your question below.


>
> >        key.input_slots_valid = fp->Base.InputsRead | VARYING_BIT_POS;
> >
> >     key.clamp_fragment_color = ctx->API == API_OPENGL_COMPAT;
> > diff --git a/src/mesa/drivers/dri/i965/brw_wm.c
> b/src/mesa/drivers/dri/i965/brw_wm.c
> > index 3df2b7d..3e59880 100644
> > --- a/src/mesa/drivers/dri/i965/brw_wm.c
> > +++ b/src/mesa/drivers/dri/i965/brw_wm.c
> > @@ -466,7 +466,8 @@ static void brw_wm_populate_key( struct brw_context
> *brw,
> >        (ctx->Multisample.SampleAlphaToCoverage ||
> ctx->Color.AlphaEnabled);
> >
> >     /* BRW_NEW_VUE_MAP_GEOM_OUT */
> > -   if (brw->gen < 6)
> > +   if (brw->gen < 6 || _mesa_bitcount_64(fp->program.Base.InputsRead &
> > +                                         BRW_FS_VARYING_INPUT_MASK) >
> 16)
>
> Same question here.
>
> >        key->input_slots_valid = brw->vue_map_geom_out.slots_valid;
> >
> >     /* The unique fragment program ID */
> >
>
> Other than those few comments, this is still:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130916/4b983117/attachment-0001.html>


More information about the mesa-dev mailing list