[Mesa-dev] [PATCH 9/9] i965/gs: implement EndPrimitive() functionality in the visitor.

Tue Sep 10 22:12:47 PDT 2013

On 09/10/2013 09:56 PM, Paul Berry wrote:
> On 10 September 2013 12:44, Kenneth Graunke wrote:
[snip]
>     Maybe shorten this to:
>        urb_write_flags |= BRW_URB_WRITE_USE_CHANNEL_MASKS;
> 
>     > +   if (c->control_data_header_size_bits > 128)
>     > +      urb_write_flags = urb_write_flags |
>     BRW_URB_WRITE_PER_SLOT_OFFSET;
> 
>     Ditto.
> 
> 
> Sadly, commit cfe39ea (i965: Allow C++ type safety in the use of enum
> brw_urb_write_flags.) prevents this.  It was one of the disadvantages I
> listed when I sent the patch out for review
> (http://lists.freedesktop.org/archives/mesa-dev/2013-August/043775.html). 
> Chad and Francisco had some ideas for ways to avoid those disadvantages,
> but their ideas didn't seem to garner much support.

Oh, right...not a big deal either way.

[snip]
>     I had a bit of trouble with this derivation, but Paul helped me out
>     on IRC.
> 
>     Although it's not mentioned here, bits_per_vertex is either 1 or 2.
> 
>     The first case is easy:
>     (vertex_count * bits_per_vertex) % 32 == 0
>     vertex_count % 32 == 0
>     vertex_count & 31 == 0 (common trick)
> 
>     For the second case, we're writing 2 bits per vertex, which means that
>     every 16 vertices, we need to emit something.  So we can actually treat
>     this as (vertex_count % 16 == 0), which is equivalent to (vertex_count &
>     15 == 0).
> 
> 
> Not that it really matters, since as you point out bits_per_vertex is
> always either 1 or 2, but technically the derivation works for any value
> of bits_per_vertex that's a power of 2.  I've expanded the comment to
> include a fuller derivation:
> 
>          /* Only emit control data bits if we've finished accumulating a
> batch
>           * of 32 bits.  This is the case when:
>           *
>           *     (vertex_count * bits_per_vertex) % 32 == 0
>           *
>           * (in other words, when the last 5 bits of vertex_count *
>           * bits_per_vertex are 0).  Assuming bits_per_vertex == 2^n for
> some
>           * integer n (which is always the case, since bits_per_vertex is
>           * always 1 or 2), this is equivalent to requiring that the
> last 5-n
>           * bits of vertex_count are 0:
>           *
>           *     vertex_count & (2^(5-n) - 1) == 0
>           *
>           * 2^(5-n) == 2^5 / 2^n == 32 / bits_per_vertex, so this is
>           * equivalent to:
>           *
>           *     vertex_count & (32 / bits_per_vertex - 1) == 0
>           */

Ah, that makes a lot of sense.  Thanks for the extra comments.

[snip]
>     > +   /* control_data_bits |= 1 << ((vertex_count - 1) % 32) */
>     > +   src_reg one(this, glsl_type::uint_type);
>     > +   emit(MOV(dst_reg(one), 1u));
>     > +   src_reg prev_count(this, glsl_type::uint_type);
>     > +   emit(ADD(dst_reg(prev_count), this->vertex_count, 0xffffffffu));
> 
>     This threw me a bit...wasn't obviously equivalent to the formula in the
>     comment above.
> 
>     But after some investigation,
>     (vertex_count - 1) % 32 == vertex_count + 0xffffffffu
>     for vertex_count in [1..31].  They're different for vertex_count == 0,
>     but 1 << either-value is equivalent, so the final expression checks out.
> 
>     Clever.
> 
> Actually, that alone wouldn't be good enough, because vertex_count can
> get larger than 31 (it counts all vertices output by the geometry
> shader, not just those that we are currently accumulating cut bits
> for).  The real reason they are equivalent is in a detail that's buried
> in the documentation of the SHL instruction: it ignores all but the
> bottom 5 bits of its second source argument, so it's as if there's an
> implicit "% 32" on that argument.  I've added a comment above the SHL
> instruction to make that clearer:
> 
>    /* Note: we're relying on the fact that the GEN SHL instruction only pays
>     * attention to the lower 5 bits of its second source argument, so on
> this
>     * architecture, 1 << (vertex_count - 1) is equivalent to 1 <<
>     * ((vertex_count - 1) % 32).
>     */
>    emit(SHL(dst_reg(mask), one, prev_count));

Ohh.  Ew.  Yeah, it works out nicely in this case, though, so as long as
it's commented it's probably fine.

[snip]
> Thanks so much for the detailed review, Ken.  Assuming I don't hear
> anything to the contrary, I'll plan on pushing the series tomorrow.

Sounds good to me!

--Ken