[Mesa-dev] [PATCH 21/22] i965/vec4: Allow CSE on subset VF constant loads

Tue Mar 6 00:53:33 UTC 2018

On Friday, February 23, 2018 3:56:06 PM PST Ian Romanick wrote:
> From: Ian Romanick <ian.d.romanick at intel.com>
> 
> No changes on other platforms.
> 
> Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
> total instructions in shared programs: 13059891 -> 13059884 (<.01%)
> instructions in affected programs: 431 -> 424 (-1.62%)
> helped: 7
> HURT: 0
> helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
> helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49%
> 95% mean confidence interval for instructions value: -1.00 -1.00
> 95% mean confidence interval for instructions %-change: -3.39% -0.71%
> Instructions are helped.
> 
> total cycles in shared programs: 409260032 -> 409260018 (<.01%)
> cycles in affected programs: 4228 -> 4214 (-0.33%)
> helped: 7
> HURT: 0
> helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
> helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28%
> 95% mean confidence interval for cycles value: -2.00 -2.00
> 95% mean confidence interval for cycles %-change: -1.15% 0.07%
> 
> Inconclusive result (%-change mean confidence interval includes 0).
> 
> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
> ---
>  src/intel/compiler/brw_vec4_cse.cpp | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/src/intel/compiler/brw_vec4_cse.cpp b/src/intel/compiler/brw_vec4_cse.cpp
> index aa2f127..3302939 100644
> --- a/src/intel/compiler/brw_vec4_cse.cpp
> +++ b/src/intel/compiler/brw_vec4_cse.cpp
> @@ -104,6 +104,27 @@ operands_match(const vec4_instruction *a, const vec4_instruction *b)
>        return xs[0].equals(ys[0]) &&
>               ((xs[1].equals(ys[1]) && xs[2].equals(ys[2])) ||
>                (xs[2].equals(ys[1]) && xs[1].equals(ys[2])));
> +   } else if (a->opcode == BRW_OPCODE_MOV &&
> +              xs[0].file == IMM &&
> +              xs[0].type == BRW_REGISTER_TYPE_VF) {
> +      src_reg tmp_x = xs[0];
> +      src_reg tmp_y = ys[0];
> +
> +      /* Smash out the values that are not part of the writemask.  Otherwise
> +       * the equals and negative_equals operators will fail due to mismatches
> +       * in unused components.
> +       */
> +      static const uint32_t mask[] = {
> +         0x00000000, 0x000000ff, 0x0000ff00, 0x0000ffff,
> +         0x00ff0000, 0x00ff00ff, 0x00ffff00, 0x00ffffff,
> +         0xff000000, 0xff0000ff, 0xff00ff00, 0xff00ffff,
> +         0xffff0000, 0xffff00ff, 0xffffff00, 0xffffffff
> +      };

This looks a bit crazy at first glance...it isn't though, it's just a
mapping from WRITEMASK_* to the proper byte masks.  But I think it might
be clearer to write it with an explicit formula:

    unsigned ab_writemask = a->dst.writemask & b->dst.writemask;
    uint32_t mask = ((ab_writemask & WRITEMASK_X) ? 0x000000ff : 0) |
                    ((ab_writemask & WRITEMASK_Y) ? 0x0000ff00 : 0) |
                    ((ab_writemask & WRITEMASK_Z) ? 0x00ff0000 : 0) |
                    ((ab_writemask & WRITEMASK_W) ? 0xff000000 : 0);
    tmp_x.ud &= mask;
    tmp_y.ud &= mask;

This makes it pretty obvious what's going on.

Either way (though I prefer mine),
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

> +
> +      tmp_x.ud &= mask[a->dst.writemask & b->dst.writemask];
> +      tmp_y.ud &= mask[a->dst.writemask & b->dst.writemask];
> +
> +      return tmp_x.equals(tmp_y);
>     } else if (!a->is_commutative()) {
>        return xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && xs[2].equals(ys[2]);
>     } else {
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20180305/cbbab9fa/attachment.sig>