<div dir="ltr">On 15 September 2013 21:16, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org" target="_blank">kenneth@whitecape.org</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="h5">On 09/03/2013 04:18 PM, Paul Berry wrote:<br>
> Since the SF/SBE stage is only capable of performing arbitrary<br>
> reorderings of 16 varying slots, we can't arrange the fragment shader<br>
> inputs in an arbitrary order if there are more than 16 input varying<br>
> slots in use. We need to make sure that slots 16-31 match the<br>
> corresponding outputs of the previous pipeline stage.<br>
><br>
> The easiest way to accomplish this is to just make all varying slots<br>
> match up with the previous pipeline stage.<br>
> ---<br>
> src/mesa/drivers/dri/i965/brw_fs.cpp | 42 ++++++++++++++++++++++++++++++------<br>
> src/mesa/drivers/dri/i965/brw_wm.c | 3 ++-<br>
> 2 files changed, 38 insertions(+), 7 deletions(-)<br>
><br>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp<br>
> index 7950d5f6..8d73a0f 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp<br>
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp<br>
> @@ -1237,11 +1237,40 @@ fs_visitor::calculate_urb_setup()<br>
> int urb_next = 0;<br>
> /* Figure out where each of the incoming setup attributes lands. */<br>
> if (brw->gen >= 6) {<br>
> - for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {<br>
> - if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &<br>
> - BITFIELD64_BIT(i)) {<br>
> - c->prog_data.urb_setup[i] = urb_next++;<br>
> - }<br>
> + if (_mesa_bitcount_64(fp->Base.InputsRead &<br>
> + BRW_FS_VARYING_INPUT_MASK) <= 16) {<br>
> + /* The SF/SBE pipeline stage can do arbitrary rearrangement of the<br>
> + * first 16 varying inputs, so we can put them wherever we want.<br>
> + * Just put them in order.<br>
> + */<br>
<br>
</div></div>It might be nice to have a comment saying why this is useful (as opposed<br>
to always following the previous stage's ordering). As I understand it,<br>
this is useful when the VS outputs a bunch of varyings but the FS only<br>
uses a couple of them---we can pack them into the first few slots,<br>
saving space.<br></blockquote><div><br></div><div>Also, it allows us to avoid recompiling the fragment shader when the vertex (or geometry) shader changes. I've added this comment to clarify:<br><br> * This is useful because it means that (a) inputs not used by the<br>
* fragment shader won't take up valuable register space, and (b) we<br> * won't have to recompile the fragment shader if it gets paired with<br> * a different vertex (or geometry) shader.<br>
<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
> + for (unsigned int i = 0; i < VARYING_SLOT_MAX; i++) {<br>
> + if (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &<br>
> + BITFIELD64_BIT(i)) {<br>
> + c->prog_data.urb_setup[i] = urb_next++;<br>
> + }<br>
> + }<br>
> + } else {<br>
> + /* We have enough input varyings that the SF/SBE pipeline stage can't<br>
> + * arbitrarily rearrange them to suit our whim; we have to put them<br>
> + * in an order that matches the output of the previous pipeline stage<br>
> + * (geometry or vertex shader).<br>
> + */<br>
> + struct brw_vue_map prev_stage_vue_map;<br>
> + brw_compute_vue_map(brw, &prev_stage_vue_map,<br>
> + c->key.input_slots_valid);<br>
> + int first_slot = 2 * BRW_SF_URB_ENTRY_READ_OFFSET;<br>
> + assert(prev_stage_vue_map.num_slots <= first_slot + 32);<br>
> + for (int slot = first_slot; slot < prev_stage_vue_map.num_slots;<br>
> + slot++) {<br>
> + int varying = prev_stage_vue_map.slot_to_varying[slot];<br>
> + if (varying != BRW_VARYING_SLOT_COUNT &&<br>
<br>
</div>It wasn't immediately obvious to me why varying would be<br>
BRW_VARYING_SLOT_COUNT. But, on further inspection, I see this is the<br>
value that gets assigned to empty slots.<br>
<br>
Up to you whether you want to add a small comment.<br></blockquote><div><br></div><div>Good point. I've added this comment:<br><br> /* Note that varying == BRW_VARYING_SLOT_COUNT when a slot is<br> * unused.<br>
*/<br><br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
> + (fp->Base.InputsRead & BRW_FS_VARYING_INPUT_MASK &<br>
> + BITFIELD64_BIT(varying))) {<br>
> + c->prog_data.urb_setup[varying] = slot - first_slot;<br>
> + urb_next = MAX2(urb_next, slot + 1);<br>
> + }<br>
> + }<br>
> + urb_next = prev_stage_vue_map.num_slots - first_slot;<br>
<br>
</div>Huh? It looks like you're setting urb_next in the loop, then clobbering<br>
it immediately after the loop. This should probably be fixed.<br></blockquote><div><br></div><div>Oops, I meant to delete the first reference to urb_next entirely, and I guess I forgot. Fixed, thanks.<br></div><div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
> }<br>
> } else {<br>
> /* FINISHME: The sf doesn't map VS->FS inputs for us very well. */<br>
> @@ -3149,7 +3178,8 @@ brw_fs_precompile(struct gl_context *ctx, struct gl_shader_program *prog)<br>
> key.iz_lookup |= IZ_DEPTH_WRITE_ENABLE_BIT;<br>
> }<br>
><br>
> - if (brw->gen < 6)<br>
> + if (brw->gen < 6 || _mesa_bitcount_64(fp->Base.InputsRead &<br>
> + BRW_FS_VARYING_INPUT_MASK) > 16)<br>
<br>
</div>Could this be simplified to:<br>
<br>
if (brw->gen < 6 || c->prog_data.num_varying_inputs > 16)<br>
<br>
Or are these values different?<br></blockquote><div><br></div><div>Unfortunately not. This logic is happening during setup of the program key, which happens before compilation, so c->prog_data hasn't been calculated yet. The same applies to your question below.<br>
</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="im"><br>
> key.input_slots_valid = fp->Base.InputsRead | VARYING_BIT_POS;<br>
><br>
> key.clamp_fragment_color = ctx->API == API_OPENGL_COMPAT;<br>
> diff --git a/src/mesa/drivers/dri/i965/brw_wm.c b/src/mesa/drivers/dri/i965/brw_wm.c<br>
> index 3df2b7d..3e59880 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_wm.c<br>
> +++ b/src/mesa/drivers/dri/i965/brw_wm.c<br>
> @@ -466,7 +466,8 @@ static void brw_wm_populate_key( struct brw_context *brw,<br>
> (ctx->Multisample.SampleAlphaToCoverage || ctx->Color.AlphaEnabled);<br>
><br>
> /* BRW_NEW_VUE_MAP_GEOM_OUT */<br>
> - if (brw->gen < 6)<br>
> + if (brw->gen < 6 || _mesa_bitcount_64(fp->program.Base.InputsRead &<br>
> + BRW_FS_VARYING_INPUT_MASK) > 16)<br>
<br>
</div>Same question here.<br>
<div class="im"><br>
> key->input_slots_valid = brw->vue_map_geom_out.slots_valid;<br>
><br>
> /* The unique fragment program ID */<br>
><br>
<br>
</div>Other than those few comments, this is still:<br>
Reviewed-by: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>><br>
</blockquote></div><br></div></div>