[Mesa-dev] [PATCH 4/4] i965: Implement SIMD16 texturing on Gen4.

Kenneth Graunke kenneth at whitecape.org
Sun Apr 5 20:11:52 PDT 2015


On Saturday, April 04, 2015 04:28:49 PM Jordan Justen wrote:
> On 2015-04-04 01:23:28, Kenneth Graunke wrote:
> > This allows SIMD16 mode to work for a lot more programs.  Texturing is
> > also more efficient in SIMD16 mode than SIMD8.  Several messages don't
> > actually exist in SIMD8 mode, so we did SIMD16 messages and threw away
> > half of the data.  Now we compute real data in both halves.
> > 
> > Also, the SIMD16 "sample" message doesn't require all three coordinate
> > components to exist (like the SIMD8 one), so we can shorten the message
> > lengths, cutting register usage a bit.
> > 
> > I chose to implement the visitor functionality in a separate function,
> > since mixing true SIMD16 with SIMD8 code that uses SIMD16 fallbacks
> > seemed like a mess.  The new code bails on a few cases where we'd
> > have to do two SIMD8 messages - we just fall back to SIMD8 for now.
> > 
> > Improves performance in "Shadowrun: Dragonfall - Director's Cut" by
> > about 20% on GM45 (measured with LIBGL_SHOW_FPS=1 while standing around
> > in the first mission).
> > 
> > Signed-off-by: Kenneth Graunke <kenneth at whitecape.org>
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs.h             |  4 ++
> >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 28 ++++++++---
> >  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   | 68 +++++++++++++++++++++++++-
> >  3 files changed, 90 insertions(+), 10 deletions(-)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
> > index 278a8ee..cfdbf55 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs.h
> > +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> > @@ -271,6 +271,10 @@ public:
> >                                fs_reg shadow_comp,
> >                                fs_reg lod, fs_reg lod2, int grad_components,
> >                                uint32_t sampler);
> > +   fs_inst *emit_texture_gen4_simd16(ir_texture_opcode op, fs_reg dst,
> > +                                     fs_reg coordinate, int vector_elements,
> > +                                     fs_reg shadow_c, fs_reg lod,
> > +                                     uint32_t sampler);
> >     fs_inst *emit_texture_gen5(ir_texture_opcode op, fs_reg dst,
> >                                fs_reg coordinate, int coord_components,
> >                                fs_reg shadow_comp,
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > index 40e51aa..2743297 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp
> > @@ -622,16 +622,26 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src
> >          /* Note that G45 and older determines shadow compare and dispatch width
> >           * from message length for most messages.
> >           */
> > -        assert(dispatch_width == 8);
> > -        msg_type = BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE;
> > -        if (inst->shadow_compare) {
> > -           assert(inst->mlen == 6);
> > -        } else {
> > -           assert(inst->mlen <= 4);
> > -        }
> > +         if (dispatch_width == 8) {
> > +            msg_type = BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE;
> > +            if (inst->shadow_compare) {
> > +               assert(inst->mlen == 6);
> > +            } else {
> > +               assert(inst->mlen <= 4);
> > +            }
> > +         } else {
> > +            if (inst->shadow_compare) {
> > +               msg_type = BRW_SAMPLER_MESSAGE_SIMD16_SAMPLE_COMPARE;
> > +               assert(inst->mlen == 9);
> > +            } else {
> > +               msg_type = BRW_SAMPLER_MESSAGE_SIMD16_SAMPLE;
> > +               assert(inst->mlen <= 7 && inst->mlen % 2 == 1);
> > +            }
> > +         }
> >          break;
> >        case FS_OPCODE_TXB:
> >          if (inst->shadow_compare) {
> > +            assert(dispatch_width == 8);
> >             assert(inst->mlen == 6);
> >             msg_type = BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_BIAS_COMPARE;
> >          } else {
> > @@ -642,6 +652,7 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src
> >          break;
> >        case SHADER_OPCODE_TXL:
> >          if (inst->shadow_compare) {
> > +            assert(dispatch_width == 8);
> >             assert(inst->mlen == 6);
> >             msg_type = BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_LOD_COMPARE;
> >          } else {
> > @@ -652,11 +663,12 @@ fs_generator::generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src
> >          break;
> >        case SHADER_OPCODE_TXD:
> >          /* There is no sample_d_c message; comparisons are done manually */
> > +         assert(dispatch_width == 8);
> >          assert(inst->mlen == 7 || inst->mlen == 10);
> >          msg_type = BRW_SAMPLER_MESSAGE_SIMD8_SAMPLE_GRADIENTS;
> >          break;
> >        case SHADER_OPCODE_TXF:
> > -        assert(inst->mlen == 9);
> > +         assert(inst->mlen <= 9 && inst->mlen % 2 == 1);
> >          msg_type = BRW_SAMPLER_MESSAGE_SIMD16_LD;
> >          simd_mode = BRW_SAMPLER_SIMD_MODE_SIMD16;
> >          break;
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > index 8c0ec33..25c424a 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> > @@ -1435,8 +1435,6 @@ fs_visitor::emit_texture_gen4(ir_texture_opcode op, fs_reg dst,
> >     bool simd16 = false;
> >     fs_reg orig_dst;
> >  
> > -   no16("SIMD16 texturing on Gen4 not supported yet.");
> > -
> >     /* g0 header. */
> >     mlen = 1;
> >  
> > @@ -1588,6 +1586,69 @@ fs_visitor::emit_texture_gen4(ir_texture_opcode op, fs_reg dst,
> >     return inst;
> >  }
> >  
> > +fs_inst *
> > +fs_visitor::emit_texture_gen4_simd16(ir_texture_opcode op, fs_reg dst,
> > +                                     fs_reg coordinate, int vector_elements,
> > +                                     fs_reg shadow_c, fs_reg lod,
> > +                                     uint32_t sampler)
> > +{
> > +   fs_reg message(MRF, 2, BRW_REGISTER_TYPE_F, dispatch_width);
> > +   bool has_lod = op == ir_txl || op == ir_txb;
> > +
> > +   if (has_lod && shadow_c.file != BAD_FILE)
> > +      no16("TXB and TXL with shadow comparison unsupported in SIMD16.");
> > +
> > +   if (op == ir_txd)
> > +      no16("textureGrad unsupported in SIMD16.");
> > +
> > +   /* Copy the coordinates. */
> > +   for (int i = 0; i < vector_elements; i++) {
> > +      emit(MOV(retype(offset(message, i), coordinate.type), coordinate));
> > +      coordinate = offset(coordinate, 1);
> > +   }
> > +
> > +   fs_reg msg_end = offset(message, vector_elements);
> > +
> > +   /* Messages other than sample and ld require all three components */
> > +   if (has_lod || shadow_c.file != BAD_FILE) {
> > +      for (int i = vector_elements; i < 3; i++) {
> > +         emit(MOV(offset(message, i), fs_reg(0.0f)));
> > +      }
> > +   }
> > +
> > +   if (has_lod) {
> > +      fs_reg msg_lod = retype(offset(message, 3), op == ir_txf ?
> > +                              BRW_REGISTER_TYPE_UD : BRW_REGISTER_TYPE_F);
> 
> From above: has_lod = op == ir_txl || op == ir_txb, so the
> op == ir_txf check here should always be false, right?
> 
> Should has_lod also check for ir_txf?

Good catch, thanks!  I added ir_txf to the has_lod case.

Technically, if lod == 0, we can probably skip setting has_lod to avoid
having to fill out the entire <u, v, r, lod> message.  But there's some
errata saying unnecessary values 'must be zero' - which probably means
that you just can't program them to non-zero...leaving them off should
be OK.  But I'm not sure I care to find out.

texelFetch is a GLSL 1.30 feature, which isn't supported on Gen4 - I
suppose that's why I didn't see the bug.  Still worth fixing; probably
not worth optimizing just yet :)

> Otherwise,
> Series Reviewed-by: Jordan Justen <jordan.l.justen at intel.com>

Thank you!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150405/61aecdad/attachment.sig>


More information about the mesa-dev mailing list