[Mesa-dev] [PATCH 3/3] intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.

Thu Dec 7 20:44:21 UTC 2017

On December 7, 2017 11:23:09 Jason Ekstrand <jason at jlekstrand.net> wrote:

> Ugh, I just remembered we still haven't landed anything for this...
>
> On Fri, Oct 27, 2017 at 5:05 PM, Francisco Jerez <currojerez at riseup.net>
> wrote:
>
>> This addresses a long-standing back-end compiler bug that could lead
>> to cross-channel data corruption in loops executed non-uniformly.  In
>> some cases live variables extending through a loop divergence point
>> (e.g. a non-uniform break) into a convergence point (e.g. the end of
>> the loop) wouldn't be considered live along all physical control flow
>> paths the SIMD thread could possibly have taken in between due to some
>> channels remaining in the loop for additional iterations.
>>
>> This patch fixes the problem by extending the CFG with physical edges
>> that don't exist in the idealized non-vectorized program, but
>> represent valid control flow paths the SIMD EU may take due to the
>> divergence of logical threads.  This makes sense because the i965 IR
>> is explicitly SIMD, and it's not uncommon for instructions to have an
>> influence on neighboring channels (e.g. a force_writemask_all header
>> setup), so the behavior of the SIMD thread as a whole needs to be
>> considered.
>>
>
> I've read all three patches and are convinced that the first two are
> correct and that this one does solve the problem.  That said, I'm still not
> sure what I think of this approach.  It does solve the problem but it's
> correctness relies on even more subtle interactions between the way we
> build the CFG and liveness analysis than we had before.  I don't like
> subtle interactions.  That said, I'm a bit on the fence as to how much
> longer I want to go around in circles trying to fix this...

After some time on a whiteboard, I've come to the conclusion that this is a 
reasonable mental model.  In particular it has two nice properties:

 1) Every instruction in the loop is reachable from every other instruction 
in the loop.

 2) Once you enter the loop, you cannot leave without going through a BREAK.

Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>

>> No changes in shader-db.
>> ---
>>  src/intel/compiler/brw_cfg.cpp | 75 ++++++++++++++++++++++++++++++
>> ++++++++----
>>  1 file changed, 69 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/intel/compiler/brw_cfg.cpp b/src/intel/compiler/brw_cfg.c
>> pp
>> index fad12eec588..600b428a492 100644
>> --- a/src/intel/compiler/brw_cfg.cpp
>> +++ b/src/intel/compiler/brw_cfg.cpp
>> @@ -98,6 +98,7 @@ ends_block(const backend_instruction *inst)
>>            op == BRW_OPCODE_ELSE ||
>>            op == BRW_OPCODE_CONTINUE ||
>>            op == BRW_OPCODE_BREAK ||
>> +          op == BRW_OPCODE_DO ||
>>            op == BRW_OPCODE_WHILE;
>>  }
>>
>> @@ -268,13 +269,57 @@ cfg_t::cfg_t(exec_list *instructions)
>>           }
>>
>>           cur->instructions.push_tail(inst);
>> +
>> +         /* Represent divergent execution of the loop as a pair of
>> alternative
>> +          * edges coming out of the DO instruction: For any physical
>> iteration
>> +          * of the loop a given logical thread can either start off
>> enabled
>> +          * (which is represented as the "next" successor), or disabled
>> (if it
>> +          * has reached a non-uniform exit of the loop during a previous
>> +          * iteration, which is represented as the "cur_while" successor).
>> +          *
>> +          * The disabled edge will be taken by the logical thread anytime
>> we
>> +          * arrive at the DO instruction through a back-edge coming from a
>> +          * conditional exit of the loop where divergent control flow
>> started.
>> +          *
>> +          * This guarantees that there is a control-flow path from any
>> +          * divergence point of the loop into the convergence point
>> +          * (immediately past the WHILE instruction) such that it
>> overlaps the
>> +          * whole IP region of divergent control flow (potentially the
>> whole
>> +          * loop) *and* doesn't imply the execution of any instructions
>> part
>> +          * of the loop (since the corresponding execution mask bit will
>> be
>> +          * disabled for a diverging thread).
>> +          *
>> +          * This way we make sure that any variables that are live
>> throughout
>> +          * the region of divergence for an inactive logical thread are
>> also
>> +          * considered to interfere with any other variables assigned by
>> +          * active logical threads within the same physical region of the
>> +          * program, since otherwise we would risk cross-channel data
>> +          * corruption.
>> +          */
>> +         next = new_block();
>> +         cur->add_successor(mem_ctx, next);
>> +         cur->add_successor(mem_ctx, cur_while);
>> +         set_next_block(&cur, next, ip);
>>          break;
>>
>>        case BRW_OPCODE_CONTINUE:
>>           cur->instructions.push_tail(inst);
>>
>> +         /* A conditional CONTINUE may start a region of divergent control
>> +          * flow until the start of the next loop iteration (*not* until
>> the
>> +          * end of the loop which is why the successor is not the
>> top-level
>> +          * divergence point at cur_do).  The live interval of any
>> variable
>> +          * extending through a CONTINUE edge is guaranteed to overlap the
>> +          * whole region of divergent execution, because any variable
>> live-out
>> +          * at the CONTINUE instruction will also be live-in at the top
>> of the
>> +          * loop, and therefore also live-out at the bottom-most point of
>> the
>> +          * loop which is reachable from the top (since a control flow
>> path
>> +          * exists from a definition of the variable through this CONTINUE
>> +          * instruction, the top of the loop, the (reachable) bottom of
>> the
>> +          * loop, the top of the loop again, into a use of the variable).
>> +          */
>>           assert(cur_do != NULL);
>> -        cur->add_successor(mem_ctx, cur_do);
>> +         cur->add_successor(mem_ctx, cur_do->next());
>>
>>          next = new_block();
>>          if (inst->predicate)
>> @@ -286,8 +331,18 @@ cfg_t::cfg_t(exec_list *instructions)
>>        case BRW_OPCODE_BREAK:
>>           cur->instructions.push_tail(inst);
>>
>> -         assert(cur_while != NULL);
>> -        cur->add_successor(mem_ctx, cur_while);
>> +         /* A conditional BREAK instruction may start a region of
>> divergent
>> +          * control flow until the end of the loop if the condition is
>> +          * non-uniform, in which case the loop will execute additional
>> +          * iterations with the present channel disabled.  We model this
>> as a
>> +          * control flow path from the divergence point to the convergence
>> +          * point that overlaps the whole IP range of the loop and skips
>> over
>> +          * the execution of any other instructions part of the loop.
>> +          *
>> +          * See the DO case for additional explanation.
>> +          */
>> +         assert(cur_do != NULL);
>> +         cur->add_successor(mem_ctx, cur_do);
>>
>>          next = new_block();
>>          if (inst->predicate)
>> @@ -300,10 +355,18 @@ cfg_t::cfg_t(exec_list *instructions)
>>           cur->instructions.push_tail(inst);
>>
>>           assert(cur_do != NULL && cur_while != NULL);
>> -        cur->add_successor(mem_ctx, cur_do);
>>
>> -         if (inst->predicate)
>> -            cur->add_successor(mem_ctx, cur_while);
>> +         /* A conditional WHILE instruction may start a region of
>> divergent
>> +          * control flow until the end of the loop, just like the BREAK
>> +          * instruction.  See the BREAK case for more details.  OTOH an
>> +          * unconditional WHILE instruction is non-divergent (just like an
>> +          * unconditional CONTINUE), and will necessarily lead to the
>> +          * execution of an additional iteration of the loop for all
>> enabled
>> +          * channels, so we may skip over the divergence point at the top
>> of
>> +          * the loop to keep the CFG as unambiguous as possible.
>> +          */
>> +         cur->add_successor(mem_ctx, inst->predicate ? cur_do :
>> +                                     cur_do->next());
>>
>>          set_next_block(&cur, cur_while, ip);
>>
>> --
>> 2.14.2
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20171207/6da92945/attachment.html>