[Mesa-dev] [PATCH v2 5/6] i965/vec4: Adjust spilling cost for consecutive instructions

Thu Jul 30 06:39:59 PDT 2015

Francisco Jerez <currojerez at riseup.net> writes:

> Iago Toral Quiroga <itoral at igalia.com> writes:
>
>> Previous patches made it so that we do not need to unspill the same vgrf
>> with every instruction as long as these instructions come right after
>> the register was spilled or unspilled. This means that actually spilling
>> the register is now cheaper in these scenarios, so adjust the spilling
>> cost calculation accordingly.
>> ---
>>  .../drivers/dri/i965/brw_vec4_reg_allocate.cpp     | 61 +++++++++++++++++++---
>>  1 file changed, 55 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> index 9a69fac..652a68c 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
>> @@ -269,6 +269,21 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
>>  {
>>     float loop_scale = 1.0;
>>  
>> +   /* If a spilled register is written (or unspilled) and then immediately read,
>> +    * we will avoid to emit scratch reads for that register for as long as
>> +    * consecutive instructions keep reading that register. This makes spilling
>> +    * the register cheaper, so we need to account for this here. Since any
>> +    * instruction can only have at most 3 src operands we can only have up to
>> +    * 3 cached registers (i.e. registers that we do not need to unspill if they
>> +    * are read in a consecutive instruction) at any given time.
>> +    *
>> +    * We keep two lists, cached[0] has the registers that are cached before
>> +    * the current instruction, and cached[1] has the registers that are
>> +    * cached after the current instruction. We copy cached[1] into cached[0]
>> +    * after processing each instruction.
>> +    */
>> +   int cached[2][3] = { -1, -1, -1,   -1, -1, -1 };
>> +
>
> Wouldn't it be easier to drop the array and just compare the sources of
> each instruction with the previous one?
>
>>     for (unsigned i = 0; i < this->alloc.count; i++) {
>>        spill_costs[i] = 0.0;
>>        no_spill[i] = alloc.sizes[i] != 1;
>> @@ -279,18 +294,52 @@ vec4_visitor::evaluate_spill_costs(float *spill_costs, bool *no_spill)
>>      * loops run 10 times.
>>      */
>>     foreach_block_and_inst(block, vec4_instruction, inst, cfg) {
>> +      memset(cached[1], -1, sizeof(cached[1]));
>> +
>>        for (unsigned int i = 0; i < 3; i++) {
>> -	 if (inst->src[i].file == GRF) {
>> -	    spill_costs[inst->src[i].reg] += loop_scale;
>> -	    assert(!inst->src[i].reladdr);
>> -	 }
>> +         if (inst->src[i].file == GRF) {
>> +            /* Don't increase spilling cost if src[i] isn't going
>> +             * to be unspilled
>> +             */
>> +            int slot;
>> +            for (slot = 0; slot < 3; slot++) {
>> +               if (inst->src[i].reg == cached[0][slot])
>
> I guess this should take the reg_offset and writemask of the cached
> register into account.  I suggest you use backend_reg::in_range() to
> check if there is overlap.
>
Er, it's of course not necessary to check reg_offset because it's
guaranteed to be zero, but checking writemask is AFAICT -- Using
backend_reg::in_range() may improve readability either way.

>> +                  break;
>> +            }
>> +            if (slot == 3) {
>> +               /* Not cached, we will unspill src[i] */
>> +               spill_costs[inst->src[i].reg] += loop_scale;
>> +               assert(!inst->src[i].reladdr);
>> +            } else {
>> +               /* It is cached. Since it is read in this instruction it
>> +                * stays cached for the next.
>> +                */
>> +               cached[1][slot] = inst->src[i].reg;
>> +            }
>> +         }
>>        }
>>  
>>        if (inst->dst.file == GRF) {
>> -	 spill_costs[inst->dst.reg] += loop_scale;
>> -	 assert(!inst->dst.reladdr);
>> +         int slot;
>> +         /* Mark this reg as cached because if the next instruction reads it
>> +          * we won't unspill it.
>> +          */
>> +         for (slot = 0; slot < 3; slot++) {
>> +            if (cached[1][slot] == -1) {
>> +               cached[1][slot] = inst->dst.reg;
>> +               break;
>> +            }
>> +         }
>> +         assert(slot < 3);
>> +         spill_costs[inst->dst.reg] += loop_scale;
>> +         assert(!inst->dst.reladdr);
>>        }
>>  
>> +      /* Cached registers after this instruction are the cached registers
>> +       * for the next instruction
>> +       */
>> +      memcpy(cached[0], cached[1], sizeof(cached[0]));
>> +
>>        switch (inst->opcode) {
>>  
>>        case BRW_OPCODE_DO:
>> -- 
>> 1.9.1
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 212 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150730/f86aa776/attachment-0001.sig>