[Mesa-dev] [PATCH] i965/fs: Strip trailing contant zeroes in sample messages

Kenneth Graunke kenneth at whitecape.org
Fri Apr 24 10:27:24 PDT 2015


On Friday, April 24, 2015 08:02:58 AM Neil Roberts wrote:
> If a send message is emitted with a message length that is less than
> required for the message then the remaining parameters default to
> zero. We can take advantage of this to save a register when a shader
> passes constant zeroes as the final coordinates to the sample
> function.
> 
> I think this might be useful for GLES applications that are using 2D
> textures to simulate 1D textures.
> 
> On Skylake it will be useful for shaders that do
> texelFetch(tex,something,0) which I think is fairly common. This helps
> more on Skylake because in that case the order of the instruction
> operands are u,v,lod,r which is good for 2D textures whereas before
> they were u,lod,v,r which is only good for 1D textures.
> 
> On Haswell:
> total instructions in shared programs: 8538662 -> 8537377 (-0.02%)
> instructions in affected programs:     193546 -> 192261 (-0.66%)
> helped:                                1032
> 
> On Skylake:
> total instructions in shared programs: 10336216 -> 10332976 (-0.03%)
> instructions in affected programs:     243118 -> 239878 (-1.33%)
> helped:                                1066
> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 43 ++++++++++++++++++++++++++++++++++++
>  src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
>  2 files changed, 44 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 61ee056..87a15b3 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2536,6 +2536,48 @@ fs_visitor::opt_algebraic()
>  }
>  
>  /**
> + * Optimize sample messages that have constant zero values for the trailing
> + * texture coordinates. We can just reduce the message length for these
> + * instructions instead of reserving a register for it. Trailing parameters
> + * that aren't sent default to zero anyway. This will cause the dead code
> + * eliminator to remove the MOV instruction that would otherwise be emitted to
> + * set up the zero value.
> + */
> +bool
> +fs_visitor::opt_zero_samples()
> +{
> +   bool progress = false;
> +
> +   foreach_block_and_inst(block, fs_inst, inst, cfg) {
> +      if ((inst->opcode == SHADER_OPCODE_TEX ||
> +           inst->opcode == SHADER_OPCODE_TXF) &&
> +          !inst->shadow_compare) {

I like this idea!

We definitely need to skip this optimization on Gen4, since the Gen4/G45
sampler infers the texturing opcode based on the message length.  But
for Gen5+, it should be no problem.

Matt mentioned that we have to emit zero in some cases due to hardware
bugs.  IIRC, we used to skip some parameters in the middle - i.e. if the
message took "u, v, r, lod"...and we were using a 2D texture...we'd omit
'r', since it shouldn't matter.  But it did matter - and had to be
zeroed.  I think skipping ones at the end and reducing mlen should be
fine.

Why not do this for all texture messages, though?  Or for that matter, all
messages?  inst->is_tex() or inst->mlen > 0 might make sense.

> +         fs_inst *load_payload = (fs_inst *) inst->prev;
> +
> +         if (load_payload->is_head_sentinel() ||
> +             load_payload->opcode != SHADER_OPCODE_LOAD_PAYLOAD)
> +            continue;
> +
> +         /* We don't want to remove the message header. Removing all of the
> +          * parameters is avoided because it seems to cause a GPU hang but I
> +          * can't find any documentation indicating that this is expected.
> +          */
> +         while (inst->mlen > inst->header_present + dispatch_width / 8 &&
> +                load_payload->src[(inst->mlen - inst->header_present) /
> +                                  (dispatch_width / 8) - 1].is_zero()) {
> +            inst->mlen -= dispatch_width / 8;
> +            progress = true;
> +         }

Another idea...you could just create a new LOAD_PAYLOAD for what you
want, and leave the old one in place just in case it's used (with the
assumption that it's probably not, and dead code elimination will make
it go away).  Just a suggestion.

> +      }
> +   }
> +
> +   if (progress)
> +      invalidate_live_intervals();
> +
> +   return progress;
> +}
> +
> +/**
>   * Optimize sample messages which are followed by the final RT write.
>   *
>   * CHV, and GEN9+ can mark a texturing SEND instruction with EOT to have its
> @@ -3824,6 +3866,7 @@ fs_visitor::optimize()
>  
>     pass_num = 0;
>  
> +   OPT(opt_zero_samples);
>     OPT(opt_sampler_eot);
>  
>     if (OPT(lower_load_payload)) {
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.h b/src/mesa/drivers/dri/i965/brw_fs.h
> index 4e17d44..6200deb 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.h
> +++ b/src/mesa/drivers/dri/i965/brw_fs.h
> @@ -324,6 +324,7 @@ public:
>     bool opt_peephole_predicated_break();
>     bool opt_saturate_propagation();
>     bool opt_cmod_propagation();
> +   bool opt_zero_samples();
>     void emit_bool_to_cond_code(ir_rvalue *condition);
>     void emit_bool_to_cond_code_of_reg(ir_expression *expr, fs_reg op[3]);
>     void emit_if_gen6(ir_if *ir);
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150424/281e2eaf/attachment.sig>


More information about the mesa-dev mailing list