[Mesa-dev] [PATCH 4/5] i965/vec4: Don't emit scratch reads for a spilled register we have just written

Fri Jul 24 06:18:38 PDT 2015

Iago Toral Quiroga <itoral at igalia.com> writes:

> When we have code such as this:
>
> mov vgrf1.0.x:F, vgrf2.xxxx:F
> mov vgrf3.0.x:F, vgrf1.xxxx:F
> ...
> mov vgrf3.0.x:F, vgrf1.xxxx:F
>
> And vgrf1 is chosen for spilling, we can emit this:
>
> mov vgrf1.0.x:F, vgrf2.xxxx:F
> gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D
> mov vgrf3.0.x:F, vgrf1.xxxx:F
> ...
> gen4_scratch_read vgrf4.0.x:F, 22D
> mov vgrf3.0.x:F, vgrf4.xxxx:F
>
> Instead of this:
>
> mov vgrf1.0.x:F, vgrf2.xxxx:F
> gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D
> gen4_scratch_read vgrf4.0.x:F, 22D
> mov vgrf3.0.x:F, vgrf4.xxxx:F
> ...
> gen4_scratch_read vgrf5.0.x:F, 22D
> mov vgrf3.0.x:F, vgrf5.xxxx:F
>
> And save one scratch read while still preserving the benefits of
> spilling the register.

This sounds reasonable to me in principle.  I guess that there is in
general a trade-off between the number of spills/fills you omit and the
number of interference edges you eliminate.  It may also be worth
checking whether you can extend the same principle to cache the value of
the variable in a GRF until the next instruction regardless of whether
it was written or read (e.g. so you don't unspill the same register in
two adjacent instructions).

In either case it seems like the overall cost of spilling a register
would be decreased in cases where this heuristic can be applied, would
it make sense to update the cost metric accordingly?

One more comment inline.

> ---
>  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp     | 39 +++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> index 80ab813..5fed2f9 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> @@ -334,6 +334,18 @@ vec4_visitor::choose_spill_reg(struct ra_graph *g)
>     return ra_get_best_spill_node(g);
>  }
>  
> +static bool
> +writemask_matches_swizzle(unsigned writemask, unsigned swizzle)
> +{
> +   for (int i = 0; i < 4; i++) {
> +      unsigned channel = 1 << BRW_GET_SWZ(swizzle, i);
> +      if (!(writemask & channel))
> +         return false;
> +   }
> +
> +   return true;
> +}
> +
>  void
>  vec4_visitor::spill_reg(int spill_reg_nr)
>  {
> @@ -341,11 +353,33 @@ vec4_visitor::spill_reg(int spill_reg_nr)
>     unsigned int spill_offset = last_scratch++;
>  
>     /* Generate spill/unspill instructions for the objects being spilled. */
> +   vec4_instruction *spill_write_inst = NULL;
>     foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
> +      /* We don't spill registers used for scratch */
> +      if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ ||
> +          inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
> +         continue;
> +
>        int scratch_reg = -1;
>        for (unsigned int i = 0; i < 3; i++) {
>           if (inst->src[i].file == GRF && inst->src[i].reg == spill_reg_nr) {
> -            if (scratch_reg == -1) {
> +            /* If we are reading the spilled register right after writing
> +             * to it we can skip the scratch read and use directly the
> +             * register we used as source for the scratch write. For this
> +             * to work we must check that:
> +             *
> +             * 1) The write is inconditional, that is, it is not predicated or
> +                  it is a SEL.
> +             * 2) All the channels that we read have been written in that
> +             *    last write instruction.
> +             */
> +            if (spill_write_inst &&
> +                (!spill_write_inst->predicate ||
> +                 spill_write_inst->opcode == BRW_OPCODE_SEL) &&
> +                writemask_matches_swizzle(spill_write_inst->dst.writemask,
> +                                          inst->src[i].swizzle)) {

brw_mask_for_swizzle() returns the mask of components accessed by a
swizzle, you could just AND it with ~spill_write_inst->dst.writemask to
find out whether it's contained in the destination of the previous
instruction.

> +               scratch_reg = spill_write_inst->dst.reg;
> +            } else if (scratch_reg == -1) {
>                 scratch_reg = alloc.allocate(1);
>                 src_reg temp = inst->src[i];
>                 temp.reg = scratch_reg;
> @@ -358,6 +392,9 @@ vec4_visitor::spill_reg(int spill_reg_nr)
>  
>        if (inst->dst.file == GRF && inst->dst.reg == spill_reg_nr) {
>           emit_scratch_write(block, inst, spill_offset);
> +         spill_write_inst = inst;
> +      } else {
> +         spill_write_inst = NULL;
>        }
>     }
>  
> -- 
> 1.9.1
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150724/60f2ff96/attachment.sig>