[Mesa-dev] [PATCH 9/9] i965/gs: implement EndPrimitive() functionality in the visitor.

Paul Berry stereotype441 at gmail.com
Tue Sep 10 21:56:27 PDT 2013


On 10 September 2013 12:44, Kenneth Graunke <kenneth at whitecape.org> wrote:

> On 09/09/2013 08:20 AM, Paul Berry wrote:
> > +void
> > +vec4_gs_visitor::emit_control_data_bits()
> > +{
> > +   assert(c->control_data_bits_per_vertex != 0);
> > +
> > +   /* Since the URB_WRITE_OWORD message operates with 128-bit (vec4
> sized)
> > +    * granularity, we need to use two tricks to ensure that the batch
> of 32
> > +    * control data bits is written to the appropriate DWORD in the URB.
>  To
> > +    * select which vec4 we are writing to, we use the "slot {0,1}
> offset"
> > +    * fields of the message header.  To select which DWORD in the vec4
> we are
> > +    * writing to, we use the channel mask fields of the message header.
>  To
> > +    * avoid punalizing geometry shaders that emit a small number of
> vertices
>
> PUNY GEOMETRY SHADERS!  I think you meant "penalized" :)
>

Fixed, thanks :)


>
> > +    * with extra bookkeeping, we only do each of these tricks when
> > +    * c->prog_data.control_data_header_size_bits is large enough to
> make it
> > +    * necessary.
> > +    *
> > +    * Note: this means that if we're outputting just a single DWORD of
> control
> > +    * data bits, we'll actually replicate it four times since we won't
> do any
> > +    * channel masking.  But that's not a problem since in this case the
> > +    * hardware only pays attention to the first DWORD.
> > +    */
> > +   enum brw_urb_write_flags urb_write_flags = BRW_URB_WRITE_OWORD;
> > +   if (c->control_data_header_size_bits > 32)
> > +      urb_write_flags = urb_write_flags |
> BRW_URB_WRITE_USE_CHANNEL_MASKS;
>
> Maybe shorten this to:
>    urb_write_flags |= BRW_URB_WRITE_USE_CHANNEL_MASKS;
>
> > +   if (c->control_data_header_size_bits > 128)
> > +      urb_write_flags = urb_write_flags | BRW_URB_WRITE_PER_SLOT_OFFSET;
>
> Ditto.
>

Sadly, commit cfe39ea (i965: Allow C++ type safety in the use of enum
brw_urb_write_flags.) prevents this.  It was one of the disadvantages I
listed when I sent the patch out for review (
http://lists.freedesktop.org/archives/mesa-dev/2013-August/043775.html).
Chad and Francisco had some ideas for ways to avoid those disadvantages,
but their ideas didn't seem to garner much support.


>
> >  void
> >  vec4_gs_visitor::visit(ir_emit_vertex *)
> >  {
> > @@ -238,6 +383,45 @@ vec4_gs_visitor::visit(ir_emit_vertex *)
> >              src_reg(num_output_vertices), BRW_CONDITIONAL_L));
> >     emit(IF(BRW_PREDICATE_NORMAL));
> >     {
> > +      /* If we're outputting 32 control data bits or less, then we can
> wait
> > +       * until the shader is over to output them all.  Otherwise we
> need to
> > +       * output them as we go.  Now is the time to do it, since we're
> about to
> > +       * output the vertex_count'th vertex, so it's guaranteed that the
> > +       * control data bits associated with the (vertex_count - 1)th
> vertex is
>
> "are" correct?  (control data bits is plural...)
>

Fixed, thanks.


>
> > +       * correct.
> > +       */
> > +      if (c->control_data_header_size_bits > 32) {
> > +         this->current_annotation = "emit vertex: emit control data
> bits";
> > +         /* Only emit control data bits if we've finished accumulating
> a batch
> > +          * of 32 bits.  This is the case when:
> > +          *
> > +          *     (vertex_count * bits_per_vertex) % 32 == 0
> > +          *
> > +          * Since bits_per_vertex is a power of two, that's equivalent
> to:
> > +          *
> > +          *     vertex_count & (32 / bits_per_vertex - 1) == 0
> > +          */
>
> I had a bit of trouble with this derivation, but Paul helped me out on IRC.
>
> Although it's not mentioned here, bits_per_vertex is either 1 or 2.
>
> The first case is easy:
> (vertex_count * bits_per_vertex) % 32 == 0
> vertex_count % 32 == 0
> vertex_count & 31 == 0 (common trick)
>
> For the second case, we're writing 2 bits per vertex, which means that
> every 16 vertices, we need to emit something.  So we can actually treat
> this as (vertex_count % 16 == 0), which is equivalent to (vertex_count &
> 15 == 0).
>

Not that it really matters, since as you point out bits_per_vertex is
always either 1 or 2, but technically the derivation works for any value of
bits_per_vertex that's a power of 2.  I've expanded the comment to include
a fuller derivation:

         /* Only emit control data bits if we've finished accumulating a
batch
          * of 32 bits.  This is the case when:
          *
          *     (vertex_count * bits_per_vertex) % 32 == 0
          *
          * (in other words, when the last 5 bits of vertex_count *
          * bits_per_vertex are 0).  Assuming bits_per_vertex == 2^n for
some
          * integer n (which is always the case, since bits_per_vertex is
          * always 1 or 2), this is equivalent to requiring that the last
5-n
          * bits of vertex_count are 0:
          *
          *     vertex_count & (2^(5-n) - 1) == 0
          *
          * 2^(5-n) == 2^5 / 2^n == 32 / bits_per_vertex, so this is
          * equivalent to:
          *
          *     vertex_count & (32 / bits_per_vertex - 1) == 0
          */


> > @@ -253,7 +437,47 @@ vec4_gs_visitor::visit(ir_emit_vertex *)
> >  void
> >  vec4_gs_visitor::visit(ir_end_primitive *)
> >  {
> > -   assert(!"Not implemented yet");
> > +   /* We can only do EndPrimitive() functionality when the control data
> > +    * consists of cut bits.  Fortunately, the only time it isn't is
> when the
> > +    * output type is points, in which case EndPrimitive() is a no-op.
> > +    */
> > +   if (c->prog_data.control_data_format !=
> > +       GEN7_GS_CONTROL_DATA_FORMAT_GSCTL_CUT) {
> > +      return;
> > +   }
> > +
> > +   /* Cut bits use one bit per vertex. */
> > +   assert(c->control_data_bits_per_vertex == 1);
> > +
> > +   /* Cut bit n should be set to 1 if EndPrimitive() was called after
> emitting
> > +    * vertex n, 0 otherwise.  So all we need to do here is mark bit
> > +    * (vertex_count - 1) % 32 in the cut_bits register to indicate that
> > +    * EndPrimitive() was called after emitting vertex (vertex_count -
> 1);
> > +    * vec4_gs_visitor::emit_control_data_bits() will take care of the
> rest.
> > +    *
> > +    * Note that if EndPrimitve() is called before emitting any
> vertices, this
> > +    * will cause us to set bit 31 of the control_data_bits register to
> 1.
> > +    * That's fine because:
> > +    *
> > +    * - If max_vertices < 32, then vertex number 31 (zero-based) will
> never be
> > +    *   output, so the hardware will ignore cut bit 31.
> > +    *
> > +    * - If max_vertices == 32, then vertex number 31 is guaranteed to
> be the
> > +    *   last vertex, so setting cut bit 31 has no effect (since the
> primitive
> > +    *   is automatically ended when the GS terminates).
> > +    *
> > +    * - If max_vertices > 32, then the ir_emit_vertex visitor will
> reset the
> > +    *   control_data_bits register to 0 when the first vertex is
> emitted.
> > +    */
> > +
> > +   /* control_data_bits |= 1 << ((vertex_count - 1) % 32) */
> > +   src_reg one(this, glsl_type::uint_type);
> > +   emit(MOV(dst_reg(one), 1u));
> > +   src_reg prev_count(this, glsl_type::uint_type);
> > +   emit(ADD(dst_reg(prev_count), this->vertex_count, 0xffffffffu));
>
> This threw me a bit...wasn't obviously equivalent to the formula in the
> comment above.
>
> But after some investigation,
> (vertex_count - 1) % 32 == vertex_count + 0xffffffffu
> for vertex_count in [1..31].  They're different for vertex_count == 0,
> but 1 << either-value is equivalent, so the final expression checks out.
>
> Clever.
>

Actually, that alone wouldn't be good enough, because vertex_count can get
larger than 31 (it counts all vertices output by the geometry shader, not
just those that we are currently accumulating cut bits for).  The real
reason they are equivalent is in a detail that's buried in the
documentation of the SHL instruction: it ignores all but the bottom 5 bits
of its second source argument, so it's as if there's an implicit "% 32" on
that argument.  I've added a comment above the SHL instruction to make that
clearer:

   /* Note: we're relying on the fact that the GEN SHL instruction only pays
    * attention to the lower 5 bits of its second source argument, so on
this
    * architecture, 1 << (vertex_count - 1) is equivalent to 1 <<
    * ((vertex_count - 1) % 32).
    */
   emit(SHL(dst_reg(mask), one, prev_count));



>
> > +   src_reg mask(this, glsl_type::uint_type);
> > +   emit(SHL(dst_reg(mask), one, prev_count));
> > +   emit(OR(dst_reg(this->control_data_bits), this->control_data_bits,
> mask));
> >  }
> >
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
> b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
> > index 1193e28..90dd1de 100644
> > --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.h
> > @@ -96,8 +96,10 @@ protected:
> >
> >  private:
> >     int setup_varying_inputs(int payload_reg, int *attribute_map);
> > +   void emit_control_data_bits();
> >
> >     src_reg vertex_count;
> > +   src_reg control_data_bits;
> >     const struct brw_gs_compile * const c;
> >  };
> >
> >
>
> After looking at this a bunch, I think your approach makes sense, and I
> haven't been able to come up with anything simpler.  Nice work.
>
> Other than my few questions and comments, this series is:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>

Thanks so much for the detailed review, Ken.  Assuming I don't hear
anything to the contrary, I'll plan on pushing the series tomorrow.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20130910/f6489243/attachment.html>


More information about the mesa-dev mailing list