[Mesa-dev] [PATCH v2 12/11] i965: Add scalar geometry shader support.

Wed Oct 28 03:08:01 PDT 2015

On Tuesday, October 27, 2015 04:40:19 PM Kristian Høgsberg wrote:
> On Mon, Oct 12, 2015 at 02:55:32PM -0700, Kenneth Graunke wrote:
> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.cpp              | 174 ++++++++++
> >  src/mesa/drivers/dri/i965/brw_fs.h                |  16 +-
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp          | 378 ++++++++++++++++++++++
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp      |  49 ++-
> >  src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp |  21 ++
> >  5 files changed, 628 insertions(+), 10 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > index dde8c45..778237a 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> > @@ -43,6 +43,7 @@
> >  #include "brw_wm.h"
> >  #include "brw_fs.h"
> >  #include "brw_cs.h"
> > +#include "brw_vec4_gs_visitor.h"
> >  #include "brw_cfg.h"
> >  #include "brw_dead_control_flow.h"
> >  #include "main/uniforms.h"
> > @@ -1347,6 +1348,47 @@ fs_visitor::emit_discard_jump()
> >  }
> >  
> >  void
> > +fs_visitor::emit_gs_thread_end()
> > +{
> > +   assert(stage == MESA_SHADER_GEOMETRY);
> > +
> > +   if (gs_compile->control_data_header_size_bits > 0) {
> > +      emit_gs_control_data_bits(this->final_gs_vertex_count);
> > +   }
> > +
> > +   const fs_builder abld = bld.annotate("thread end");
> > +   fs_inst *inst;
> > +
> > +   if (gs_compile->prog_data.static_vertex_count != -1) {
> > +      foreach_in_list_reverse(fs_inst, prev, &this->instructions) {
> > +         if (prev->opcode == SHADER_OPCODE_URB_WRITE_SIMD8 ||
> > +             prev->opcode == SHADER_OPCODE_URB_WRITE_SIMD8_MASKED ||
> > +             prev->opcode == SHADER_OPCODE_URB_WRITE_SIMD8_PER_SLOT ||
> > +             prev->opcode == SHADER_OPCODE_URB_WRITE_SIMD8_MASKED_PER_SLOT) {
> > +            prev->eot = true;
> > +            return;
> > +         } else if (prev->is_control_flow() || prev->has_side_effects()) {
> > +            break;
> > +         }
> > +      }
> > +      fs_reg hdr = abld.vgrf(BRW_REGISTER_TYPE_UD, 1);
> > +      abld.MOV(hdr, fs_reg(retype(brw_vec8_grf(1, 0), BRW_REGISTER_TYPE_UD)));
> > +      inst = abld.emit(SHADER_OPCODE_URB_WRITE_SIMD8, reg_undef, hdr);
> > +      inst->mlen = 1;
> > +   } else {
> > +      fs_reg payload = abld.vgrf(BRW_REGISTER_TYPE_UD, 2);
> > +      fs_reg *sources = ralloc_array(mem_ctx, fs_reg, 2);
> > +      sources[0] = fs_reg(retype(brw_vec8_grf(1, 0), BRW_REGISTER_TYPE_UD));
> > +      sources[1] = this->final_gs_vertex_count;
> > +      abld.LOAD_PAYLOAD(payload, sources, 2, 2);
> > +      inst = abld.emit(SHADER_OPCODE_URB_WRITE_SIMD8, reg_undef, payload);
> > +      inst->mlen = 2;
> > +   }
> > +   inst->eot = true;
> > +   inst->offset = 0;
> > +}
> > +
> > +void
> >  fs_visitor::assign_curb_setup()
> >  {
> >     if (dispatch_width == 8) {
> > @@ -1550,6 +1592,53 @@ fs_visitor::assign_vs_urb_setup()
> >     }
> >  }
> >  
> > +void
> > +fs_visitor::assign_gs_urb_setup()
> > +{
> > +   assert(stage == MESA_SHADER_GEOMETRY);
> > +
> > +   const gl_geometry_program *gp = &gs_compile->gp->program;
> > +   brw_vue_prog_data *vue_prog_data = (brw_vue_prog_data *) prog_data;
> > +
> > +   first_non_payload_grf +=
> > +      8 * vue_prog_data->urb_read_length * gp->VerticesIn;
> 
> Where does the 8 * come from here?

Vertex data is read from the URB in 256-bit (HWord) units - two vec4
slots at a time.  The fact that this happens to be the size of a
register is quite confusing, but not entirely relevant :)

Data read from the URB is expanded into the SIMD8 data layout, where
a vec4 takes up 4 registers.  So when we read a 256-bit URB block,
those two vec4s end up taking up 8 registers.

See vec4_gs_visitor::setup_varying_inputs(), which multiplies by 2
since in SIMD4x2 mode a vec4 takes up a whole register, so 2 vec4s
take up 2 registers.

Another way of explaining it: a SIMD4x2 GS processes 2 completely
unrelated primitives in each half.  Data comes from two VUEs.

VUE entry for primitive 0:  [<A0.x A0.y A0.z A0.w>, <B0.x B0.y B0.z B0.w>, ...]
VUE entry for primitive 1:  [<A1.x A1.y A1.z A1.w>, <B1.x B1.y B1.z B1.w>, ...]

One URB access reads 256-bits, but splats that across 2 registers:

r10 = A1.x A1.y A1.z A1.w | A0.x A0.y A0.z A0.w
r11 = B1.x B1.y B1.z B1.w | B0.x B0.y B0.z B0.w

Similarly, SIMD8 shaders process 8 independent primitives per thread.
The data comes from 8 different VUEs:

VUE entry for primitive 0:  [<A0.x A0.y A0.z A0.w>, <B0.x B0.y B0.z B0.w>, ...]
VUE entry for primitive 1:  [<A1.x A1.y A1.z A1.w>, <B1.x B1.y B1.z B1.w>, ...]
VUE entry for primitive 2:  [<A2.x A2.y A2.z A2.w>, <B2.x B2.y B2.z B2.w>, ...]
VUE entry for primitive 3:  [<A3.x A3.y A3.z A3.w>, <B3.x B3.y B3.z B3.w>, ...]
VUE entry for primitive 4:  [<A4.x A4.y A4.z A4.w>, <B4.x B4.y B4.z B4.w>, ...]
VUE entry for primitive 5:  [<A5.x A5.y A5.z A5.w>, <B5.x B5.y B5.z B5.w>, ...]
VUE entry for primitive 6:  [<A6.x A6.y A6.z A6.w>, <B6.x B6.y B6.z B6.w>, ...]
VUE entry for primitive 7:  [<A7.x A7.y A7.z A7.w>, <B7.x B7.y B7.z B7.w>, ...]

One URB access reads 256-bits, but splats that across 8 registers:

r10 = A7.x A6.x A5.x A4.x A3.x A2.x A1.x A0.x
r11 = A7.y A6.y A5.y A4.y A3.y A2.y A1.y A0.y
r12 = A7.z A6.z A5.z A4.z A3.z A2.z A1.z A0.z
r13 = A7.w A6.w A5.w A4.w A3.w A2.w A1.w A0.w
r14 = B7.x B6.x B5.x B4.x B3.x B2.x B1.x B0.x
r15 = B7.y B6.y B5.y B4.y B3.y B2.y B1.y B0.y
r16 = B7.z B6.z B5.z B4.z B3.z B2.z B1.z B0.z
r17 = B7.w B6.w B5.w B4.w B3.w B2.w B1.w B0.w

I hope that helps!  This was a good question.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20151028/c83dbfa7/attachment.sig>