[Mesa-dev] [PATCH v5] i965: add opportunistic behaviour to opt_vector_float()

Matt Turner mattst88 at gmail.com
Fri Mar 4 21:22:43 UTC 2016


On Wed, Mar 2, 2016 at 4:21 AM, Juan A. Suarez Romero
<jasuarez at igalia.com> wrote:
> opt_vector_float() transforms several scalar MOV operations to a single
> vectorial MOV.
>
> This is done when those MOV covers all the components of the destination
> register. So something like:
>
> mov vgrf3.0.xy:D, 0D
> mov vgrf3.0.w:D, 1065353216D
> mov vgrf3.0.z:D, 0D
>
> is transformed in:
>
> mov vgrf3.0:F, [0F, 0F, 0F, 1F]
>
> But there are cases where not all the components are written. For
> example, in:
>
> mov vgrf2.0.x:D, 1073741824D
> mov vgrf3.0.xy:D, 0D
> mov vgrf3.0.w:D, 1065353216D
> mov vgrf4.0.xy:D, 1065353216D
> mov vgrf4.0.w:D, 0D
> mov vgrf6.0:UD, u4.xyzw:UD
>
> Nor vgrf3 nor vgrf4 .z components are written, so the optimization is
> not applied.
>
> But it could be applied anyway with the components covered, using a
> writemask to select the ones written. So we could transform it in:
>
> mov vgrf2.0.x:D, 1073741824D
> mov vgrf3.0.xyw:F, [0F, 0F, 0F, 1F]
> mov vgrf4.0.xyw:F, [1F, 1F, 0F, 0F]
> mov vgrf6.0:UD, u4.xyzw:UD
>
> This commit does precisely that: opportunistically apply
> opt_vector_float() when possible.
>
> The improvement obtained regarding current upstream
> (11.2-branchpoint-139-ge8fd60e) is:
>
> total instructions in shared programs: 7385826 -> 7378040 (-0.11%)
> instructions in affected programs: 427528 -> 419742 (-1.82%)
> helped: 3980
> HURT: 0
>
> total cycles in shared programs: 98989662 -> 98974744 (-0.02%)
> cycles in affected programs: 1748966 -> 1734048 (-0.85%)
> helped: 2149
> HURT: 30
>
> total loops in shared programs: 1979 -> 1979 (0.00%)
> loops in affected programs: 0 -> 0
> helped: 0
> HURT: 0

I get even better results with my copy of shader-db :)

total instructions in shared programs: 7124660 -> 7114784 (-0.14%)
instructions in affected programs: 443078 -> 433202 (-2.23%)
helped: 4998
HURT: 0

total cycles in shared programs: 64757760 -> 64728016 (-0.05%)
cycles in affected programs: 1401686 -> 1371942 (-2.12%)
helped: 3243
HURT: 38

I'll update the stats, fix some whitespace issues noted below and push
this patch with my Reviewed-by.

Thanks Juan!

> LOST:   0
> GAINED: 0
>
> v2: change vectorize_mov() signature (Matt).
> v3: take in account predicates (Juan).
>
> Signed-off-by: Juan A. Suarez Romero <jasuarez at igalia.com>
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp | 62 ++++++++++++++++++++++------------
>  src/mesa/drivers/dri/i965/brw_vec4.h   |  4 +++
>  2 files changed, 44 insertions(+), 22 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index 3618c72..15e71b4 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -321,6 +321,29 @@ src_reg::equals(const src_reg &r) const
>  }
>
>  bool
> +vec4_visitor::vectorize_mov(bblock_t *block, vec4_instruction *inst, uint8_t imm[4],

I linewrapped this.

> +                            vec4_instruction *imm_inst[4], int inst_count,
> +                            unsigned writemask)
> +{
> +   if (inst_count < 2) {
> +      return false;
> +   }

I've removed the braces here.

> +
> +   unsigned vf;
> +   memcpy(&vf, imm, sizeof(vf));
> +   vec4_instruction *mov = MOV(imm_inst[0]->dst, brw_imm_vf(vf));
> +   mov->dst.type = BRW_REGISTER_TYPE_F;
> +   mov->dst.writemask = writemask;
> +   inst->insert_before(block, mov);
> +
> +   for (int i = 0; i < inst_count; i++) {
> +      imm_inst[i]->remove(block);
> +   }
> +
> +   return true;}

} was likely meant to be on its own line.

> +
> +
> +bool
>  vec4_visitor::opt_vector_float()
>  {
>     bool progress = false;
> @@ -328,27 +351,37 @@ vec4_visitor::opt_vector_float()
>     int last_reg = -1, last_reg_offset = -1;
>     enum brw_reg_file last_reg_file = BAD_FILE;
>
> -   int remaining_channels = 0;
> -   uint8_t imm[4];
> +   uint8_t imm[4] = { 0 };
>     int inst_count = 0;
>     vec4_instruction *imm_inst[4];
> +   unsigned writemask = 0;
>
>     foreach_block_and_inst_safe(block, vec4_instruction, inst, cfg) {
>        if (last_reg != inst->dst.nr ||
>            last_reg_offset != inst->dst.reg_offset ||
>            last_reg_file != inst->dst.file) {
> +
> +         progress |= vectorize_mov(block, inst, imm, imm_inst, inst_count, writemask);

I removed the blank line before this and linewrapped the arguments.

> +
> +         inst_count = 0;
> +         writemask = 0;
>           last_reg = inst->dst.nr;
>           last_reg_offset = inst->dst.reg_offset;
>           last_reg_file = inst->dst.file;
> -         remaining_channels = WRITEMASK_XYZW;
> -
> -         inst_count = 0;
> +         for (int i = 0; i < 4; i++) {
> +            imm[i] = 0;
> +         }
>        }
>
>        if (inst->opcode != BRW_OPCODE_MOV ||
>            inst->dst.writemask == WRITEMASK_XYZW ||
> -          inst->src[0].file != IMM)
> +          inst->src[0].file != IMM ||
> +          inst->predicate != BRW_PREDICATE_NONE) {
> +         progress |= vectorize_mov(block, inst, imm, imm_inst, inst_count, writemask);

I linewrapped this.

> +         inst_count = 0;
> +         last_reg = -1;
>           continue;
> +      }
>
>        int vf = brw_float_to_vf(inst->src[0].f);
>        if (vf == -1)
> @@ -363,23 +396,8 @@ vec4_visitor::opt_vector_float()
>        if ((inst->dst.writemask & WRITEMASK_W) != 0)
>           imm[3] = vf;
>
> +      writemask |= inst->dst.writemask;
>        imm_inst[inst_count++] = inst;
> -
> -      remaining_channels &= ~inst->dst.writemask;
> -      if (remaining_channels == 0) {
> -         unsigned vf;
> -         memcpy(&vf, imm, sizeof(vf));
> -         vec4_instruction *mov = MOV(inst->dst, brw_imm_vf(vf));
> -         mov->dst.type = BRW_REGISTER_TYPE_F;
> -         mov->dst.writemask = WRITEMASK_XYZW;
> -         inst->insert_after(block, mov);
> -         last_reg = -1;
> -
> -         for (int i = 0; i < inst_count; i++) {
> -            imm_inst[i]->remove(block);
> -         }
> -         progress = true;
> -      }
>     }
>
>     if (progress)
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.h b/src/mesa/drivers/dri/i965/brw_vec4.h
> index 633f13c..9a9be3a 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.h
> @@ -369,6 +369,10 @@ protected:
>     virtual void gs_end_primitive();
>
>  private:
> +   bool vectorize_mov(bblock_t *block, vec4_instruction *inst, uint8_t imm[4],

I linewrapped this -- it wasn't a problem, but it now matches the
other instance that I linewrapped.

> +                      vec4_instruction *imm_inst[4], int inst_count,
> +                      unsigned writemask);
> +


More information about the mesa-dev mailing list