[Mesa-dev] [PATCH 09/16] i965/fs: Calculate interference better in register_coalesce.

Matt Turner mattst88 at gmail.com
Fri Jan 10 17:59:27 PST 2014


On Fri, Jan 10, 2014 at 5:19 PM, Jordan Justen <jljusten at gmail.com> wrote:
> On Thu, Dec 19, 2013 at 1:40 PM, Matt Turner <mattst88 at gmail.com> wrote:
>> Previously we simply considered two registers whose live ranges
>> overlapped to interfere. Cases such as
>>
>>    set A     ------
>>    ...             |
>>    mov B, A  --    |
>>    ...         | B | A
>>    use B     --    |
>>    ...             |
>>    use A     ------
>>
>> would be considered to interfere, even though B is an unmodified copy of
>> A whose live range fit wholly inside that of A.
>>
>> If no writes to A or B occur between the mov B, A and the use of B then
>> we can safely coalesce them.
>>
>> Instead of removing MOV instructions, we make them NOPs and remove them
>> at once after the main pass is finished in order to avoid recomputing
>> live intervals (which are needed to perform the previous step).
>>
>> total instructions in shared programs: 1543768 -> 1513077 (-1.99%)
>> instructions in affected programs:     951563 -> 920872 (-3.23%)
>> GAINED:                                46
>> LOST:                                  22
>> ---
>>  src/mesa/drivers/dri/i965/brw_fs.cpp | 69 ++++++++++++++++++++++++++++++++----
>>  1 file changed, 62 insertions(+), 7 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> index e4ac0a5..ad56b87 100644
>> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
>> @@ -2273,7 +2273,7 @@ fs_visitor::register_coalesce()
>>     int last_use[MAX_SAMPLER_MESSAGE_SIZE];
>>     int next_ip = 0;
>>
>> -   foreach_list_safe(node, &this->instructions) {
>> +   foreach_list(node, &this->instructions) {
>>        fs_inst *inst = (fs_inst *)node;
>>
>>        int ip = next_ip;
>> @@ -2299,8 +2299,39 @@ fs_visitor::register_coalesce()
>>        int var_to = live_intervals->var_from_reg(&inst->dst);
>>
>>        if (live_intervals->vars_interfere(var_from, var_to) &&
>> -          !inst->dst.equals(inst->src[0]))
>> -         continue;
>> +          !inst->dst.equals(inst->src[0])) {
>> +
>> +         if (live_intervals->end[var_to] > live_intervals->end[var_from])
>> +            continue;
>> +
>> +         bool overwritten = false;
>> +         int scan_ip = -1;
>> +
>> +         foreach_list(n, &this->instructions) {
>> +           fs_inst *scan_inst = (fs_inst *)n;
>> +            scan_ip++;
>> +
>> +            if (scan_inst->is_control_flow()) {
>> +               overwritten = true;
>> +               break;
>> +            }
>> +
>> +            if (scan_ip <= live_intervals->start[var_to])
>> +               continue;
>> +
>> +            if (scan_ip > live_intervals->end[var_to])
>> +               break;
>> +
>> +            if (scan_inst->dst.equals(inst->dst) ||
>> +                scan_inst->dst.equals(inst->src[0])) {
>> +               overwritten = true;
>> +               break;
>> +            }
>> +         }
>> +
>> +         if (overwritten)
>> +            continue;
>> +      }
>>
>>        if (reg_from != inst->src[0].reg) {
>>           reg_from = inst->src[0].reg;
>> @@ -2342,9 +2373,18 @@ fs_visitor::register_coalesce()
>>        if (live_channels_remaining)
>>           continue;
>>
>> +      bool removed = false;
>>        for (int i = 0; i < src_size; i++) {
>> -         if (mov[i])
>> -            mov[i]->remove();
>> +         if (mov[i]) {
>> +            removed = true;
>> +
>> +            mov[i]->opcode = BRW_OPCODE_NOP;
>> +            mov[i]->conditional_mod = BRW_CONDITIONAL_NONE;
>
> Is conditional_mod == BRW_CONDITIONAL_NONE something we should look
> for before considering coalescing a mov?

I don't actually know what a mov with a conditional mod would do, and
we don't emit a single one in all of shader-db, but it seems like we
should be considering predication.

>> +            mov[i]->dst = reg_undef;
>> +            mov[i]->src[0] = reg_undef;
>> +            mov[i]->src[1] = reg_undef;
>> +            mov[i]->src[2] = reg_undef;
>> +         }
>>        }
>>
>>        foreach_list(node, &this->instructions) {
>> @@ -2366,11 +2406,26 @@ fs_visitor::register_coalesce()
>>                       scan_inst->src[j].reg_offset = reg_to_offset[i];
>>                    }
>>                 }
>> -
>> -               progress = true;
>>              }
>>           }
>>        }
>> +
>> +      if (removed) {
>> +         live_intervals->start[var_to] = MIN2(live_intervals->start[var_to],
>> +                                              live_intervals->start[var_from]);
>> +         live_intervals->end[var_to] = MAX2(live_intervals->end[var_to],
>> +                                            live_intervals->end[var_from]);
>> +         reg_from = -1;
>
> Do you think these are what lead to the lost programs?

I don't think so.

> I haven't been able to determine why they are needed starting with this patch...

Consider

add vgrf3:F, vgrf1:F, vgrf2:F
mov vgrf4:F, vgrf3:F
mul vgrf5:F, vgrf5:F, vgrf4:F

register coalescing turns this into

add vgrf4:F, vgrf1:F, vgrf2:F
mul vgrf5:F, vgrf5:F, vgrf4:F

and now our live intervals are wrong, and calculating live intervals
is expensive. Instead of recalculating live intervals after each
successful iteration, we just fix var_to's start and end to keep them
valid.

To make sure that none of the other instructions are affected, we
temporarily replace the MOV with a NOP and then after the pass is over
remove all of the NOPs when it's safe to invalidate the live
intervals.


More information about the mesa-dev mailing list