[Mesa-dev] [PATCH v2 3/6] i965/vec4: Don't emit scratch reads for a spilled register we have just written

Tue Aug 4 06:44:24 PDT 2015

Iago Toral <itoral at igalia.com> writes:

> On Thu, 2015-07-30 at 17:08 +0300, Francisco Jerez wrote:
>> Iago Toral Quiroga <itoral at igalia.com> writes:
>> 
>> > When we have code such as this:
>> >
>> > mov vgrf1.0.x:F, vgrf2.xxxx:F
>> > mov vgrf3.0.x:F, vgrf1.xxxx:F
>> > ...
>> > mov vgrf3.0.x:F, vgrf1.xxxx:F
>> >
>> > And vgrf1 is chosen for spilling, we can emit this:
>> >
>> > mov vgrf1.0.x:F, vgrf2.xxxx:F
>> > gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D
>> > mov vgrf3.0.x:F, vgrf1.xxxx:F
>> > ...
>> > gen4_scratch_read vgrf4.0.x:F, 22D
>> > mov vgrf3.0.x:F, vgrf4.xxxx:F
>> >
>> > Instead of this:
>> >
>> > mov vgrf1.0.x:F, vgrf2.xxxx:F
>> > gen4_scratch_write hw_reg0:F, vgrf1.xxxx:D, 22D
>> > gen4_scratch_read vgrf4.0.x:F, 22D
>> > mov vgrf3.0.x:F, vgrf4.xxxx:F
>> > ...
>> > gen4_scratch_read vgrf5.0.x:F, 22D
>> > mov vgrf3.0.x:F, vgrf5.xxxx:F
>> >
>> > And save one scratch read while still preserving the benefits of
>> > spilling the register.
>> >
>> > In general, we avoid emitting scratch reads for as long as the next instruction
>> > keeps reading the spilled register. This should not harm the benefit of
>> > spilling the register because gains for register allocation only come when we
>> > have chunks of program code where the register is alive but not really used
>> > (because these are the points where we could effectively use that register for
>> > another purpose if we spilled it), so as long as consecutive instructions use
>> > that register we can avoid the scratch reads without losing anything.
>> > ---
>> >  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp     | 37 +++++++++++++++++++++-
>> >  1 file changed, 36 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> > index cff5406..fd56dae 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> > @@ -340,11 +340,43 @@ vec4_visitor::spill_reg(int spill_reg_nr)
>> >     unsigned int spill_offset = last_scratch++;
>> >  
>> >     /* Generate spill/unspill instructions for the objects being spilled. */
>> > +   vec4_instruction *spill_write_inst = NULL;
>> >     foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
>> > +      /* We don't spill registers used for scratch */
>> > +      if (inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_READ ||
>> > +          inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE)
>> > +         continue;
>> > +
>> >        int scratch_reg = -1;
>> > +      bool spill_reg_was_read = false;
>> >        for (unsigned int i = 0; i < 3; i++) {
>> >           if (inst->src[i].file == GRF && inst->src[i].reg == spill_reg_nr) {
>> > -            if (scratch_reg == -1) {
>> > +            if (!spill_reg_was_read) {
>> > +               spill_reg_was_read = (!inst->predicate ||
>> > +                                     inst->opcode == BRW_OPCODE_SEL);
>> > +            }
>> > +
>> > +            /* If we are reading the spilled register right after writing
>> > +             * to it we can skip the scratch read and use directly the
>> > +             * register we used as source for the scratch write. For this
>> > +             * to work we must check that:
>> > +             *
>> > +             * 1) The write is inconditional, that is, it is not predicated or
>> > +             *    it is a SEL.
>> > +             * 2) All the channels that we read have been written in that
>> > +             *    last write instruction.
>> > +             *
>> > +             * We keep doing this for as long as the next instruction
>> > +             * keeps reading the spilled register and break as soon as we
>> > +             * find an instruction that doesn't.
>> > +             */
>> > +            if (spill_write_inst &&
>> > +                (!spill_write_inst->predicate ||
>> > +                 spill_write_inst->opcode == BRW_OPCODE_SEL) &&
>> > +                ((brw_mask_for_swizzle(inst->src[i].swizzle) &
>> > +                 ~spill_write_inst->dst.writemask) == 0)) {
>> > +               scratch_reg = spill_write_inst->dst.reg;
>> > +            } else if (scratch_reg == -1) {
>> 
>> One suggestion: You could factor out the rather complex caching logic
>> into a separate function (e.g. 'bool can_reuse_scratch_for_source(const
>> vec4_instruction *, unsigned i, unsigned scratch_reg)').  The function
>> would simply compare scratch_reg with the sources of the current
>> instruction (up to src) and the sources and destination of the previous
>> non-scratch_read/write instruction.  If there's a match it would check
>> that the regioning is compatible with the i-th source and return true in
>> that case.  This would have several benefits:
>
> I am not sure I get the part about checking regioning with the i-th
> source.  inst->src[i] does not reference scratch_reg (at least not if we
> got here from spill_reg) so it cannot overlap with anything that
> references scratch_reg... I am going to send a v3 with the
> implementation of can_use_scratch_for_source in a few minutes, so if I
> am just missing the point I guess it is probably be easier if you just
> reply to that patch.
>
I think you did get the point, what I meant was to check all sources of
the current instruction less than i, which seems to be exactly what
you've implemented in v3.

>>  - It would keep ::spill_reg() simple and the caching heuristic factored
>>    out.  The only thing spill_reg() would still need to take care of is
>>    keep track of the last spilling temporary (e.g. as you're doing it
>>    now with the scratch_reg variable) and pass it to
>>    can_reuse_scratch_for_source(), but it would no longer be necessary
>>    to reset it to -1 when reuse is not possible.
>> 
>>  - It would allow you to use a single implementation of the caching
>>    policy to implement both spill_reg() and evaluate_spill_costs(), what
>>    would make sure that things don't go out of sync in the likely case
>>    that we want to change the caching heuristic in the future.
>> 
>> >                 scratch_reg = alloc.allocate(1);
>> >                 src_reg temp = inst->src[i];
>> >                 temp.reg = scratch_reg;
>> > @@ -357,6 +389,9 @@ vec4_visitor::spill_reg(int spill_reg_nr)
>> >  
>> >        if (inst->dst.file == GRF && inst->dst.reg == spill_reg_nr) {
>> >           emit_scratch_write(block, inst, spill_offset);
>> > +         spill_write_inst = inst;
>> > +      } else if (spill_write_inst && !spill_reg_was_read) {
>> > +         spill_write_inst = NULL;
>> >        }
>> >     }
>> >  
>> > -- 
>> > 1.9.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150804/1ca96bbd/attachment-0001.sig>