[Mesa-dev] [PATCH v2 5/6] i965/vec4: Adjust spilling cost for consecutive instructions

Thu Jul 30 23:35:00 PDT 2015

On Thu, 2015-07-30 at 16:33 +0300, Francisco Jerez wrote:
> Francisco Jerez <currojerez at riseup.net> writes:
> 
> > Iago Toral Quiroga <itoral at igalia.com> writes:
> >
> >> Previous patches made it so that we do not need to unspill the same vgrf
> >> with every instruction as long as these instructions come right after
> >> the register was spilled or unspilled. This means that actually spilling
> >> the register is now cheaper in these scenarios, so adjust the spilling
> >> cost calculation accordingly.
> >> ---
> >>  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp     | 61 +++++++++++++++++++---
> >>  1 file changed, 55 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> index 9a69fac..652a68c 100644
> >> --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
> >> @@ -269,6 +269,21 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
> >>  {
> >>     float loop_scale = 1.0;
> >>  
> >> +   /* If a spilled register is written (or unspilled) and then immediately read,
> >> +    * we will avoid to emit scratch reads for that register for as long as
> >> +    * consecutive instructions keep reading that register. This makes spilling
> >> +    * the register cheaper, so we need to account for this here. Since any
> >> +    * instruction can only have at most 3 src operands we can only have up to
> >> +    * 3 cached registers (i.e. registers that we do not need to unspill if they
> >> +    * are read in a consecutive instruction) at any given time.
> >> +    *
> >> +    * We keep two lists, cached[0] has the registers that are cached before
> >> +    * the current instruction, and cached[1] has the registers that are
> >> +    * cached after the current instruction. We copy cached[1] into cached[0]
> >> +    * after processing each instruction.
> >> +    */
> >> +   int cached[2][3] = { -1, -1, -1,   -1, -1, -1 };
> >> +
> >
> > Wouldn't it be easier to drop the array and just compare the sources of
> > each instruction with the previous one?
> >
> >>     for (unsigned i = 0; i < this->alloc.count; i++) {
> >>        spill_costs[i] = 0.0;
> >>        no_spill[i] = alloc.sizes[i] != 1;
> >> @@ -279,18 +294,52 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
> >>      * loops run 10 times.
> >>      */
> >>     foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
> >> +      memset(cached[1], -1, sizeof(cached[1]));
> >> +
> >>        for (unsigned int i = 0; i < 3; i++) {
> >> -	 if (inst->src[i].file == GRF) {
> >> -	    spill_costs[inst->src[i].reg] += loop_scale;
> >> -	    assert(!inst->src[i].reladdr);
> >> -	 }
> >> +         if (inst->src[i].file == GRF) {
> >> +            /* Don't increase spilling cost if src[i] isn't going
> >> +             * to be unspilled
> >> +             */
> >> +            int slot;
> >> +            for (slot = 0; slot < 3; slot++) {
> >> +               if (inst->src[i].reg == cached[0][slot])
> >
> > I guess this should take the reg_offset and writemask of the cached
> > register into account.  I suggest you use backend_reg::in_range() to
> > check if there is overlap.
> >
> 
> It also doesn't look like this handles the case in which the same
> register is unspilled several times for the same instruction correctly,
> as you've implemented it in PATCH 1.

True, I'll fix that and have a go at your other suggestions as well.

Iago

> >> +                  break;
> >> +            }
> >> +            if (slot == 3) {
> >> +               /* Not cached, we will unspill src[i] */
> >> +               spill_costs[inst->src[i].reg] += loop_scale;
> >> +               assert(!inst->src[i].reladdr);
> >> +            } else {
> >> +               /* It is cached. Since it is read in this instruction it
> >> +                * stays cached for the next.
> >> +                */
> >> +               cached[1][slot] = inst->src[i].reg;
> >> +            }
> >> +         }
> >>        }
> >>  
> >>        if (inst->dst.file == GRF) {
> >> -	 spill_costs[inst->dst.reg] += loop_scale;
> >> -	 assert(!inst->dst.reladdr);
> >> +         int slot;
> >> +         /* Mark this reg as cached because if the next instruction reads it
> >> +          * we won't unspill it.
> >> +          */
> >> +         for (slot = 0; slot < 3; slot++) {
> >> +            if (cached[1][slot] == -1) {
> >> +               cached[1][slot] = inst->dst.reg;
> >> +               break;
> >> +            }
> >> +         }
> >> +         assert(slot < 3);
> >> +         spill_costs[inst->dst.reg] += loop_scale;
> >> +         assert(!inst->dst.reladdr);
> >>        }
> >>  
> >> +      /* Cached registers after this instruction are the cached registers
> >> +       * for the next instruction
> >> +       */
> >> +      memcpy(cached[0], cached[1], sizeof(cached[0]));
> >> +
> >>        switch (inst->opcode) {
> >>  
> >>        case BRW_OPCODE_DO:
> >> -- 
> >> 1.9.1