[Mesa-dev] [PATCH 11/15] i965/fs: When >64 input components, order them to match prev pipeline stage.

Kenneth Graunke kenneth at whitecape.org
Sun Sep 15 21:16:40 PDT 2013


On 09/03/2013 04:18 PM, Paul Berry wrote:
> Since the SF/SBE stage is only capable of performing arbitrary
> reorderings of 16 varying slots, we can't arrange the fragment shader
> inputs in an arbitrary order if there are more than 16 input varying
> slots in use.  We need to make sure that slots 16-31 match the
> corresponding outputs of the previous pipeline stage.
> 
> The easiest way to accomplish this is to just make all varying slots
> match up with the previous pipeline stage.
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 42 ++++++++++++++++++++++++++++++------
>  src/mesa/drivers/dri/i965/brw_wm.c   |  3 ++-
>  2 files changed, 38 insertions(+), 7 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 7950d5f6..8d73a0f 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1237,11 +1237,40 @@ fs_visitor::calculate_urb_setup()
>     int urb_next = 0;
>     /* Figure out where each of the incoming setup attributes lands. */
>     if (brw->gen >= 6) {
> -      for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {
> -	 if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
> -             BITFIELD64_BIT(i)) {
> -	    c->prog_data.urb_setup[i] = urb_next++;
> -	 }
> +      if (_mesa_bitcount_64(fp->Base.InputsRead &
> +                            BRW_FS_VARYING_INPUT_MASK) <= 16) {
> +         /* The SF/SBE pipeline stage can do arbitrary rearrangement of the
> +          * first 16 varying inputs, so we can put them wherever we want.
> +          * Just put them in order.
> +          */

It might be nice to have a comment saying why this is useful (as opposed
to always following the previous stage's ordering).  As I understand it,
this is useful when the VS outputs a bunch of varyings but the FS only
uses a couple of them---we can pack them into the first few slots,
saving space.

> +         for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {
> +            if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
> +                BITFIELD64_BIT(i)) {
> +               c->prog_data.urb_setup[i] = urb_next++;
> +            }
> +         }
> +      } else {
> +         /* We have enough input varyings that the SF/SBE pipeline stage can't
> +          * arbitrarily rearrange them to suit our whim; we have to put them
> +          * in an order that matches the output of the previous pipeline stage
> +          * (geometry or vertex shader).
> +          */
> +         struct brw_vue_map prev_stage_vue_map;
> +         brw_compute_vue_map(brw, &prev_stage_vue_map,
> +                             c->key.input_slots_valid);
> +         int first_slot = 2 * BRW_SF_URB_ENTRY_READ_OFFSET;
> +         assert(prev_stage_vue_map.num_slots <= first_slot + 32);
> +         for (int slot = first_slot; slot < prev_stage_vue_map.num_slots;
> +              slot++) {
> +            int varying = prev_stage_vue_map.slot_to_varying[slot];
> +            if (varying != BRW_VARYING_SLOT_COUNT &&

It wasn't immediately obvious to me why varying would be
BRW_VARYING_SLOT_COUNT.  But, on further inspection, I see this is the
value that gets assigned to empty slots.

Up to you whether you want to add a small comment.

> +                (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &
> +                 BITFIELD64_BIT(varying))) {
> +               c->prog_data.urb_setup[varying] = slot - first_slot;
> +               urb_next = MAX2(urb_next, slot + 1);
> +            }
> +         }
> +         urb_next = prev_stage_vue_map.num_slots - first_slot;

Huh?  It looks like you're setting urb_next in the loop, then clobbering
it immediately after the loop.  This should probably be fixed.

>        }
>     } else {
>        /* FINISHME: The sf doesn't map VS->FS inputs for us very well. */
> @@ -3149,7 +3178,8 @@ brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program *prog)
>        key.iz_lookup |= IZ_DEPTH_WRITE_ENABLE_BIT;
>     }
>  
> -   if (brw->gen < 6)
> +   if (brw->gen < 6 || _mesa_bitcount_64(fp->Base.InputsRead &
> +                                         BRW_FS_VARYING_INPUT_MASK) > 16)

Could this be simplified to:

   if (brw->gen < 6 || c->prog_data.num_varying_inputs > 16)

Or are these values different?

>        key.input_slots_valid = fp->Base.InputsRead | VARYING_BIT_POS;
>  
>     key.clamp_fragment_color = ctx->API == API_OPENGL_COMPAT;
> diff --git a/src/mesa/drivers/dri/i965/brw_wm.c b/src/mesa/drivers/dri/i965/brw_wm.c
> index 3df2b7d..3e59880 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm.c
> @@ -466,7 +466,8 @@ static void brw_wm_populate_key( struct brw_context *brw,
>        (ctx->Multisample.SampleAlphaToCoverage || ctx->Color.AlphaEnabled);
>  
>     /* BRW_NEW_VUE_MAP_GEOM_OUT */
> -   if (brw->gen < 6)
> +   if (brw->gen < 6 || _mesa_bitcount_64(fp->program.Base.InputsRead &
> +                                         BRW_FS_VARYING_INPUT_MASK) > 16)

Same question here.

>        key->input_slots_valid = brw->vue_map_geom_out.slots_valid;
>  
>     /* The unique fragment program ID */
> 

Other than those few comments, this is still:
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>


More information about the mesa-dev mailing list