[Mesa-dev] [PATCH 06/12] i965: Add optimization pass to let us use the replicate data message

Kenneth Graunke kenneth at whitecape.org
Tue Aug 12 09:33:45 PDT 2014


On Monday, August 11, 2014 05:29:36 PM Kristian Høgsberg wrote:
> The data port has a SIMD16 'replicate data' message, which lets us write
> the same color for all 16 pixels by sending the four floats in the
> lower half of a register instead of sending 4 times 16 identical
> component values in 8 registers.
> 
> The message comes with a lot of restrictions and could be made generally
> useful by recognizing when those restriction are satisfied.  For now,
> this lets us enable the optimization when we know it's safe, but we don't
> enable it by default.  The optimization works for simple color clear shaders
> only, but does recognized and support multiple render targets.
> 
> Signed-off-by: Kristian Høgsberg <krh at bitplanet.net>
> ---
>  src/mesa/drivers/dri/i965/brw_context.h         |  1 +
>  src/mesa/drivers/dri/i965/brw_defines.h         |  1 +
>  src/mesa/drivers/dri/i965/brw_fs.cpp            | 56 +++++++++++++++++++++++++
>  src/mesa/drivers/dri/i965/brw_fs.h              |  4 ++
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp  |  5 ++-
>  src/mesa/drivers/dri/i965/gen8_fs_generator.cpp |  5 ++-
>  6 files changed, 70 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
> index 7de9b64..6ab7713 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.h
> +++ b/src/mesa/drivers/dri/i965/brw_context.h
> @@ -1033,6 +1033,7 @@ struct brw_context
>     bool has_negative_rhw_bug;
>     bool has_pln;
>     bool no_simd8;
> +   bool use_rep_send;
>  
>     /**
>      * Some versions of Gen hardware don't do centroid interpolation correctly
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h b/src/mesa/drivers/dri/i965/brw_defines.h
> index a519629..194d35f 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -850,6 +850,7 @@ enum opcode {
>      */
>     FS_OPCODE_FB_WRITE = 128,
>     FS_OPCODE_BLORP_FB_WRITE,
> +   FS_OPCODE_REP_FB_WRITE,
>     SHADER_OPCODE_RCP,
>     SHADER_OPCODE_RSQ,
>     SHADER_OPCODE_SQRT,
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 061c32d..640e222 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2215,6 +2215,59 @@ fs_visitor::compute_to_mrf()
>     return progress;
>  }
>  
> +bool
> +fs_visitor::try_rep_send()
> +{
> +   int i, count, step = dispatch_width / 8;
> +   fs_inst *start = NULL;
> +
> +   count = 0;
> +   foreach_in_list_safe(fs_inst, inst, &this->instructions) {
> +      if (count == 0)
> +         start = inst;
> +      if (inst->opcode == BRW_OPCODE_MOV &&
> +	  inst->dst.file == MRF &&
> +          inst->dst.reg == start->dst.reg + step * count &&
> +          inst->src[0].file == HW_REG &&
> +          inst->src[0].reg_offset == start->src[0].reg_offset + count) {
> +         if (count == 0)
> +            start = inst;
> +         count++;
> +      }
> +
> +      if (inst->opcode == FS_OPCODE_FB_WRITE &&
> +          count == 4 &&
> +          (inst->base_mrf == start->dst.reg ||
> +           (inst->base_mrf + 2 == start->dst.reg && inst->header_present))) {
> +         fs_inst *mov = MOV(start->dst, start->src[0]);
> +
> +         mov->dst.fixed_hw_reg =
> +            brw_vec4_reg(BRW_MESSAGE_REGISTER_FILE,
> +                         start->dst.reg, 0);
> +         mov->dst.file = HW_REG;
> +         mov->dst.type = mov->dst.fixed_hw_reg.type;
> +
> +         mov->src[0].fixed_hw_reg =
> +            brw_vec4_grf(mov->src[0].fixed_hw_reg.nr, 0);
> +         mov->src[0].file = HW_REG;
> +         mov->src[0].type = mov->src[0].fixed_hw_reg.type;
> +         mov->force_writemask_all = true;
> +         mov->dst.type = BRW_REGISTER_TYPE_F;
> +
> +         start->insert_before(mov);
> +
> +         for (i = 0; i < 4; i++)
> +            mov->next->remove();
> +
> +         inst->opcode = FS_OPCODE_REP_FB_WRITE;
> +         inst->mlen -= 4 * step - 1;
> +         count = 0;
> +      }
> +   }
> +
> +   return true;
> +}
> +

I really prefer my version of this function:

http://cgit.freedesktop.org/~kwg/mesa/commit/?h=repdata-clears-v3&id=b9bbf54b065fd5eab58367badc96cf997a521e7a

It seems more robust - it guards against predication and various cases when this can't be used.  It also handles immediate values.  And it includes comments.

One clever thing that you've done here is to run after converting UNIFORM values to HW_REGs, so you can directly use brw_vec4_reg here, rather than having to introduce a silly MOV_441 opcode like I did.  That's pretty nice.

Branch with everything:
http://cgit.freedesktop.org/~kwg/mesa/log/?h=repdata-clears-v3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140812/b9c35126/attachment.sig>


More information about the mesa-dev mailing list