On 23 December 2011 13:02, Kenneth Graunke <span dir="ltr"><<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On 12/23/2011 09:37 AM, Paul Berry wrote:<br>
> When rendering triangle strips, vertices come down the pipeline in the<br>
> order specified, even though this causes alternate triangles to have<br>
> reversed winding order. For example, if the vertices are ABCDE, then<br>
> the GS is invoked on triangles ABC, BCD, and CDE, even though this<br>
> means that triangle BCD is in the reverse of the normal winding order.<br>
> The hardware automatically flags the triangles with reversed winding<br>
> order as _3DPRIM_TRISTRIP_REVERSE, so that face culling and two-sided<br>
> coloring can be adjusted to account for the reversed order.<br>
><br>
> In order to ensure that winding order is correct when streaming<br>
> vertices out to a transform feedback buffer, we need to alter the<br>
> ordering of BCD to BDC when the first provoking vertex convention is<br>
> in use, and to CBD when the last provoking vertex convention is in<br>
> use.<br>
><br>
> To do this, we precompute an array of indices indicating where each<br>
> vertex will be placed in the transform feedback buffer; normally this<br>
> is SVBI[0] + (0, 1, 2), indicating that vertex order should be<br>
> preserved. When the primitive type is _3DPRIM_TRISTRIP_REVERSE, we<br>
> change this order to either SVBI[0] + (0, 2, 1) or SVBI[0] + (1, 0,<br>
> 2), depending on the provoking vertex convention.<br>
><br>
> Fixes piglit tests "EXT_transform_feedback/tessellation<br>
> triangle_strip" on Gen6.<br>
<br>
</div>It's really too bad we can't use the 3DSTATE_GS "Reorder Enable" bit.<br>
It's supposed to take care of this, but it looks like it only does PV<br>
first mode on Sandybridge. Turning it on makes triangle_strip wireframe<br>
fail, and nothing else pass.<br></blockquote><div><br>Yeah, I concur. That bit is *so* close to what we need, but not quite close enough.<br> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
I had a bit of trouble following this patch, but it looks okay to me.<br></blockquote><div><br>Ok. I won't push it until this evening, so let me know if you can think of a way to improve the readability (I'm at a loss for how to improve it).<br>
</div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Reviewed-by: Kenneth Graunke <<a href="mailto:kenneth@whitecape.org">kenneth@whitecape.org</a>><br>
<div class="HOEnZb"><div class="h5"><br>
> ---<br>
> src/mesa/drivers/dri/i965/brw_gs.h | 6 ++<br>
> src/mesa/drivers/dri/i965/brw_gs_emit.c | 84 ++++++++++++++++++++++++-------<br>
> 2 files changed, 72 insertions(+), 18 deletions(-)<br>
><br>
> diff --git a/src/mesa/drivers/dri/i965/brw_gs.h b/src/mesa/drivers/dri/i965/brw_gs.h<br>
> index 7bf2248..2ab8b72 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_gs.h<br>
> +++ b/src/mesa/drivers/dri/i965/brw_gs.h<br>
> @@ -83,6 +83,12 @@ struct brw_gs_compile {<br>
> struct brw_reg vertex[MAX_GS_VERTS];<br>
> struct brw_reg header;<br>
> struct brw_reg temp;<br>
> +<br>
> + /**<br>
> + * Register holding destination indices for streamed buffer writes.<br>
> + * Only used for SOL programs.<br>
> + */<br>
> + struct brw_reg destination_indices;<br>
> } reg;<br>
><br>
> /* Number of registers used to store vertex data */<br>
> diff --git a/src/mesa/drivers/dri/i965/brw_gs_emit.c b/src/mesa/drivers/dri/i965/brw_gs_emit.c<br>
> index 1f96a16..607ee75 100644<br>
> --- a/src/mesa/drivers/dri/i965/brw_gs_emit.c<br>
> +++ b/src/mesa/drivers/dri/i965/brw_gs_emit.c<br>
> @@ -45,13 +45,16 @@<br>
> /**<br>
> * Allocate registers for GS.<br>
> *<br>
> - * If svbi_payload_enable is true, then the thread will be spawned with the<br>
> - * "SVBI Payload Enable" bit set, so GRF 1 needs to be set aside to hold the<br>
> - * streamed vertex buffer indices.<br>
> + * If sol_program is true, then:<br>
> + *<br>
> + * - The thread will be spawned with the "SVBI Payload Enable" bit set, so GRF<br>
> + * 1 needs to be set aside to hold the streamed vertex buffer indices.<br>
> + *<br>
> + * - The thread will need to use the destination_indices register.<br>
> */<br>
> static void brw_gs_alloc_regs( struct brw_gs_compile *c,<br>
> GLuint nr_verts,<br>
> - bool svbi_payload_enable )<br>
> + bool sol_program )<br>
> {<br>
> GLuint i = 0,j;<br>
><br>
> @@ -60,7 +63,7 @@ static void brw_gs_alloc_regs( struct brw_gs_compile *c,<br>
> c->reg.R0 = retype(brw_vec8_grf(i, 0), BRW_REGISTER_TYPE_UD); i++;<br>
><br>
> /* Streamed vertex buffer indices */<br>
> - if (svbi_payload_enable)<br>
> + if (sol_program)<br>
> c->reg.SVBI = retype(brw_vec8_grf(i++, 0), BRW_REGISTER_TYPE_UD);<br>
><br>
> /* Payload vertices plus space for more generated vertices:<br>
> @@ -73,6 +76,11 @@ static void brw_gs_alloc_regs( struct brw_gs_compile *c,<br>
> c->reg.header = retype(brw_vec8_grf(i++, 0), BRW_REGISTER_TYPE_UD);<br>
> c->reg.temp = retype(brw_vec8_grf(i++, 0), BRW_REGISTER_TYPE_UD);<br>
><br>
> + if (sol_program) {<br>
> + c->reg.destination_indices =<br>
> + retype(brw_vec4_grf(i++, 0), BRW_REGISTER_TYPE_UD);<br>
> + }<br>
> +<br>
> c->prog_data.urb_read_length = c->nr_regs;<br>
> c->prog_data.total_grf = i;<br>
> }<br>
> @@ -351,16 +359,17 @@ gen6_sol_program(struct brw_gs_compile *c, struct brw_gs_prog_key *key,<br>
><br>
> if (key->num_transform_feedback_bindings > 0) {<br>
> unsigned vertex, binding;<br>
> + struct brw_reg destination_indices_uw =<br>
> + vec8(retype(c->reg.destination_indices, BRW_REGISTER_TYPE_UW));<br>
> +<br>
> /* Note: since we use the binding table to keep track of buffer offsets<br>
> * and stride, the GS doesn't need to keep track of a separate pointer<br>
> * into each buffer; it uses a single pointer which increments by 1 for<br>
> * each vertex. So we use SVBI0 for this pointer, regardless of whether<br>
> * transform feedback is in interleaved or separate attribs mode.<br>
> + *<br>
> + * Make sure that the buffers have enough room for all the vertices.<br>
> */<br>
> - brw_MOV(p, get_element_ud(c->reg.header, 5),<br>
> - get_element_ud(c->reg.SVBI, 0));<br>
> -<br>
> - /* Make sure that the buffers have enough room for all the vertices. */<br>
> brw_ADD(p, get_element_ud(c->reg.temp, 0),<br>
> get_element_ud(c->reg.SVBI, 0), brw_imm_ud(num_verts));<br>
> brw_CMP(p, vec1(brw_null_reg()), BRW_CONDITIONAL_LE,<br>
> @@ -368,10 +377,58 @@ gen6_sol_program(struct brw_gs_compile *c, struct brw_gs_prog_key *key,<br>
> get_element_ud(c->reg.SVBI, 4));<br>
> brw_IF(p, BRW_EXECUTE_1);<br>
><br>
> + /* Compute the destination indices to write to. Usually we use SVBI[0]<br>
> + * + (0, 1, 2). However, for odd-numbered triangles in tristrips, the<br>
> + * vertices come down the pipeline in reversed winding order, so we need<br>
> + * to flip the order when writing to the transform feedback buffer. To<br>
> + * ensure that flatshading accuracy is preserved, we need to write them<br>
> + * in order SVBI[0] + (0, 2, 1) if we're using the first provoking<br>
> + * vertex convention, and in order SVBI[0] + (1, 0, 2) if we're using<br>
> + * the last provoking vertex convention.<br>
> + *<br>
> + * Note: since brw_imm_v can only be used in instructions in<br>
> + * packed-word execution mode, and SVBI is a double-word, we need to<br>
> + * first move the appropriate immediate constant ((0, 1, 2), (0, 2, 1),<br>
> + * or (1, 0, 2)) to the destination_indices register, and then add SVBI<br>
> + * using a separate instruction. Also, since the immediate constant is<br>
> + * expressed as packed words, and we need to load double-words into<br>
> + * destination_indices, we need to intersperse zeros to fill the upper<br>
> + * halves of each double-word.<br>
> + */<br>
> + brw_MOV(p, destination_indices_uw,<br>
> + brw_imm_v(0x00020100)); /* (0, 1, 2) */<br>
> + if (num_verts == 3) {<br>
> + /* Get primitive type into temp register. */<br>
> + brw_AND(p, get_element_ud(c->reg.temp, 0),<br>
> + get_element_ud(c->reg.R0, 2), brw_imm_ud(0x1f));<br>
> +<br>
> + /* Test if primitive type is TRISTRIP_REVERSE. We need to do this as<br>
> + * an 8-wide comparison so that the conditional MOV that follows<br>
> + * moves all 8 words correctly.<br>
> + */<br>
> + brw_CMP(p, vec8(brw_null_reg()), BRW_CONDITIONAL_EQ,<br>
> + get_element_ud(c->reg.temp, 0),<br>
> + brw_imm_ud(_3DPRIM_TRISTRIP_REVERSE));<br>
> +<br>
> + /* If so, then overwrite destination_indices_uw with the appropriate<br>
> + * reordering.<br>
> + */<br>
> + brw_MOV(p, destination_indices_uw,<br>
> + brw_imm_v(key->pv_first ? 0x00010200 /* (0, 2, 1) */<br>
> + : 0x00020001)); /* (1, 0, 2) */<br>
> + brw_set_predicate_control(p, BRW_PREDICATE_NONE);<br>
> + }<br>
> + brw_ADD(p, c->reg.destination_indices,<br>
> + c->reg.destination_indices, get_element_ud(c->reg.SVBI, 0));<br>
> +<br>
> /* For each vertex, generate code to output each varying using the<br>
> * appropriate binding table entry.<br>
> */<br>
> for (vertex = 0; vertex < num_verts; ++vertex) {<br>
> + /* Set up the correct destination index for this vertex */<br>
> + brw_MOV(p, get_element_ud(c->reg.header, 5),<br>
> + get_element_ud(c->reg.destination_indices, vertex));<br>
> +<br>
> for (binding = 0; binding < key->num_transform_feedback_bindings;<br>
> ++binding) {<br>
> unsigned char vert_result =<br>
> @@ -398,15 +455,6 @@ gen6_sol_program(struct brw_gs_compile *c, struct brw_gs_prog_key *key,<br>
> SURF_INDEX_SOL_BINDING(binding), /* binding_table_index */<br>
> final_write); /* send_commit_msg */<br>
> }<br>
> -<br>
> - /* If there are more vertices to output, increment the pointer so<br>
> - * that we will start outputting to the next location in the<br>
> - * transform feedback buffers.<br>
> - */<br>
> - if (vertex != num_verts - 1) {<br>
> - brw_ADD(p, get_element_ud(c->reg.header, 5),<br>
> - get_element_ud(c->reg.header, 5), brw_imm_ud(1));<br>
> - }<br>
> }<br>
> brw_ENDIF(p);<br>
><br>
<br>
</div></div></blockquote></div><br>