[Mesa-dev] [PATCH] i965/fs: Strip trailing contant zeroes in sample messages

Matt Turner mattst88 at gmail.com
Fri Apr 24 08:30:42 PDT 2015


On Fri, Apr 24, 2015 at 8:02 AM, Neil Roberts <neil at linux.intel.com> wrote:
> If a send message is emitted with a message length that is less than
> required for the message then the remaining parameters default to
> zero. We can take advantage of this to save a register when a shader
> passes constant zeroes as the final coordinates to the sample
> function.
>
> I think this might be useful for GLES applications that are using 2D
> textures to simulate 1D textures.
>
> On Skylake it will be useful for shaders that do
> texelFetch(tex,something,0) which I think is fairly common. This helps
> more on Skylake because in that case the order of the instruction
> operands are u,v,lod,r which is good for 2D textures whereas before
> they were u,lod,v,r which is only good for 1D textures.
>
> On Haswell:
> total instructions in shared programs: 8538662 -> 8537377 (-0.02%)
> instructions in affected programs:     193546 -> 192261 (-0.66%)
> helped:                                1032
>
> On Skylake:
> total instructions in shared programs: 10336216 -> 10332976 (-0.03%)
> instructions in affected programs:     243118 -> 239878 (-1.33%)
> helped:                                1066

Neat! I never thought to try this.

I have some vague memory that there are times when we purposefully
emit a MOV 0 because of supposed hardware bugs, but looking at
brw_fs_visitor.cpp, that's seems to only be the case on Gen4. So I
think this is safe.

> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 43 ++++++++++++++++++++++++++++++++++++
>  src/mesa/drivers/dri/i965/brw_fs.h   |  1 +
>  2 files changed, 44 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 61ee056..87a15b3 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2536,6 +2536,48 @@ fs_visitor::opt_algebraic()
>  }
>
>  /**
> + * Optimize sample messages that have constant zero values for the trailing
> + * texture coordinates. We can just reduce the message length for these
> + * instructions instead of reserving a register for it. Trailing parameters
> + * that aren't sent default to zero anyway. This will cause the dead code
> + * eliminator to remove the MOV instruction that would otherwise be emitted to
> + * set up the zero value.
> + */
> +bool
> +fs_visitor::opt_zero_samples()
> +{
> +   bool progress = false;
> +
> +   foreach_block_and_inst(block, fs_inst, inst, cfg) {
> +      if ((inst->opcode == SHADER_OPCODE_TEX ||
> +           inst->opcode == SHADER_OPCODE_TXF) &&
> +          !inst->shadow_compare) {
> +         fs_inst *load_payload = (fs_inst *) inst->prev;
> +
> +         if (load_payload->is_head_sentinel() ||
> +             load_payload->opcode != SHADER_OPCODE_LOAD_PAYLOAD)
> +            continue;

We can't guarantee that the load_payload isn't used by another texture
later in the program, and since you need to change the texture
operation's mlen, I think you need to check that the load_payload
isn't used after this texture operation.

To do that, (1) add an ip variable and initialize it to -1, (2) add
ip++ as the first statement in the foreach_block_and_inst loop, (3)
add some code to this check similar to in
brw_fs_saturate_propagation.cpp using this->live_intervals.

> +
> +         /* We don't want to remove the message header. Removing all of the
> +          * parameters is avoided because it seems to cause a GPU hang but I
> +          * can't find any documentation indicating that this is expected.
> +          */
> +         while (inst->mlen > inst->header_present + dispatch_width / 8 &&
> +                load_payload->src[(inst->mlen - inst->header_present) /
> +                                  (dispatch_width / 8) - 1].is_zero()) {
> +            inst->mlen -= dispatch_width / 8;
> +            progress = true;
> +         }
> +      }
> +   }
> +
> +   if (progress)
> +      invalidate_live_intervals();
> +
> +   return progress;
> +}
> +
> +/**
>   * Optimize sample messages which are followed by the final RT write.
>   *
>   * CHV, and GEN9+ can mark a texturing SEND instruction with EOT to have its
> @@ -3824,6 +3866,7 @@ fs_visitor::optimize()
>
>     pass_num = 0;
>
> +   OPT(opt_zero_samples);

I think you're probably right that this can be done after the
optimization loop. I guess it's possible that we might trim a texture
payload down and it'll then be the same as an existing payload and we
can then CSE them. I'd be interested to see if putting it inside the
optimization loop improves anything.

>     OPT(opt_sampler_eot);


More information about the mesa-dev mailing list