[Mesa-dev] [PATCH RFC 09/11] glsl: add pass to convert GLSL IR to SSA form

Wed Feb 5 11:29:47 PST 2014

On Thu, Jan 30, 2014 at 10:30 PM, Paul Berry <stereotype441 at gmail.com>wrote:

> On 22 January 2014 09:16, Connor Abbott <cwabbott0 at gmail.com> wrote:
>
>> opt_to_ssa will convert temporaries and local variables to SSA form,
>> although for now it can't handle array and record dereferences.
>> ---
>>  src/glsl/Makefile.sources  |    1 +
>>  src/glsl/ir_optimization.h |    2 +
>>  src/glsl/opt_to_ssa.cpp    | 1155
>> ++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 1158 insertions(+)
>>  create mode 100644 src/glsl/opt_to_ssa.cpp
>>
>
> General comment: this is a long and tricky algorithm.  In addition to the
> overview information you provided at the top of opt_to_ssa.cpp (which is a
> great help, BTW), it would be really nice to have a summary that describes
> the phases of the algorithm and how they relate to each other.  I'd
> recommend moving the class definitions to a header file; this would allow
> convert_to_ssa_function() to be moved to the top of opt_to_ssa.cpp, and
> then you could annotate that function with comments explaining what each
> phase of the algorithm does.
>

Sure, yeah that makes sense.

>
> You might also consider creating separate cpp/h files (and separate
> patches) for:
>
> - ir_ssa_state_visitor (along with the associated class
> ir_ssa_variable_state)
> - ir_parameter_visitor
> - ir_phi_insertion_visitor (along with the associated class
> ir_control_flow_entry and derived classes)
> - ir_rewrite_visitor (along with the associated classes
> ir_rewrite_forward_visitor and ir_rewrite_backward_visitor)
>
> (kind of like you already did with ir_dead_branches_visitor and
> ir_loop_jumps_visitor).  That would make future reviews go a lot more
> smoothly, because it would avoid having a single large patch, and the way
> the various phases of the algorithm fit together would be clear from the
> sequence of commit messages.
>

So the reason I kept these passes together in the same file (as opposed to
ir_dead_branches_visitor and ir_loop_jumps_visitor) is that they won't
really be useful outside of this specific pass. ir_dead_branches will come
in handy any time we need to calculate the dominance tree, and
ir_loop_jumps is needed for anything that wants to insert a phi node into a
loop, but all the other passes are only going to be useful in this specific
context; they fit together in a specific way in order to make the algorithm
work. If you still think it's a good idea to put them in separate files,
then I'm ok with that, but that's the reason things are what they are now.
Perhaps I can still keep them in the same file, but split up this patch so
that they are introduced one-by-one?

>
>
> FWIW here are the notes I made about the phases of your algorithm.  Feel
> free to use my words in your comments if they're helpful:
>
> 1. Run ir_dead_branches_visitor: determine control flow out of if's.
> 2. Run ir_loop_jumps_visitor: find set of breaks and continues from each
> loop.
> 3. Run ir_ssa_state_visitor: create ir_ssa_variable_state object for each
> SSA-able var in the original prog, count number of times each SSA-able var
> is assigned to.
> 4. Run ir_parameter_visitor: convert out and inout parameters of calls to
> a form that is easier to convert to SSA.
> 5. Run ir_phi_insertion_visitor: insert so-called "trivial" phi functions
> (phi functions of the form V = phi(V, V, ...)).  Note that since phi nodes
> count as assignments, this visitor needs to update the counts produced by
> ir_ssa_state_visitor.
> 6. Call ssv.allocate_state_arrays(): allocate a big enough stack of
> ir_variable *'s inside each ir_ssa_variable_state to hold one pointer for
> each assignment to the variable (including assignments by phi nodes).
> 7. Run ir_rewrite_visitor: replace each SSA-able variable V with an SSA
> variable Vi such that no Vi is assigned more than once.  This includes
> rewriting the trivial phi functions so that they are no longer trivial.
> 8. Call ssv.remove_decls(): remove the declarations of the old variables V
> now that they have been replaced by Vi's.
>
>
> A few other things I stumbled on that could really use clarification
> include:
>
> - It took me a lot of reading to understand why ir_rewrite_visitor visits
> nodes in both forward and reverse order.  I now understand that what's
> happening is that we're maintaining a stack of SSA temporaries
> corresponding to each original SSA-able variable; when we visit the code in
> the forward direction we push temporaries onto the stack; when we visit the
> code in reverse, we pop them off again.  The order of visiting flow control
> has been carefully chosen so that at any time, the SSA temporary that's at
> the top of the stack is the one holding the value of the variable that is
> "current" from the point of view of the visitor's current location.  It
> would be helpful to have this information in a comment, perhaps with a few
> pseudocode examples to show the order in which nodes are visited.
>

 I didn't the stuff you're describing because most of it was explained in
the original paper, but I suppose we should at least have a summary for
people too lazy to read it.

btw, ir_rewrite_visitor is based on the pseudocode in figure 12 in the
paper I referenced, except rather than traverse the dominance tree
explicitly, we use the approximation I described in the overview at the
beginning - that's where the "order of visiting control flow" comes from.

>
> - The paper mentions that there is an option of producing either a pruned
> SSA form or a non-pruned SSA form (the difference is that in non-pruned SSA
> form, a phi function is inserted wherever two control paths making
> different assignments to a variable converge; in pruned SSA form phi nodes
> are pruned out if they would never be referenced).  It looks like you chose
> to produce non-pruned SSA.  Can you comment on your reasoning for doing so?
>

Honestly, I don't really know much about the modifications necessary for
that. IIRC, however, dead-code elimination will remove the phi nodes that
we would avoid by producing pruned SSA anyways, so I don't think it's worth
the extra complexity for an already complex enough algorithm.

>
> - Why don't we try to convert arrays and structures to SSA?  The paper
> suggests that A[i] = x could be replaced by A = Update(A, i, x), but it
> looks like you've chosen not to do that.  Is that because there's some
> subtle reason why it doesn't work for GLSL, or is it just because you
> haven't had a chance to make this improvement?
>

First of all, it was just a reason of time (have to introduce the update
operation, lower it, have a way for hopefully getting rid of it after we
get out of SSA, etc.). Also, I'm worried that optimizations done in SSA
form may make it impossible to remove all the update operations once we get
out of SSA, meaning that we would have to generate code to copy the entire
array, which would suck; I haven't given that too much thought though, so I
may be wrong. Also, it seems like most compilers (LLVM, GCC, R600-SB, etc.)
don't bother with this, and don't convert arrays to SSA anyways. Hopefully,
since pretty much every backend expects temporary structures to be
flattened, we won't have to worry about those either.

>
>
> Thanks so much, Connor.  You've obviously done a lot of work on this and I
> hope that the volume of my comments doesn't dishearten you; I'm really
> excited to see SSA support added to Mesa, and I think with a little
> clean-up this ought to be ready to merge in pretty soon!
>
> More comments below:
>
>
>>
>> + *
>> + * - The instruction before the loop dominates the instructions inside
>> the loop
>> + * as well as the instructions after the loop. Here is where the
>> approximation
>> + * lies: really, since the loop is guarenteed to execute at least once,
>> the
>>
>
> "guaranteed"
>
>
>> + * instructions after the loop can potentially be dominated by an
>> instruction
>> + * inside the loop. Computing that instruction, though, would be
>> complicated,
>> + * and in the end it doesn't hurt much if we ignore that detail. In the
>> end, we
>> + * may have some phi nodes where all the sources are the same, but these
>> can
>> + * easily be optimized away.
>>
>
> On my first several readings, I misunderstood this comment to mean "we
> convert to SSA form assuming that each loop might execute zero times".  Now
> that I've read through the full algorithm, I understand that this
> approximation only applies during the ir_phi_insertion_visitor part of the
> algorithm.  The rest of the algorithm doesn't make this approximation, and
> that's why the only consequence of the approximation is that we may have
> some unnecessary phi nodes.  It would be nice to add some text to clarify
> this.
>

Actually, this approximation is used in ir_rewrite_visitor as well; in the
original algorithm, we do a depth-first traversal of the dominance tree,
but as I explained above, we're basically using our approximation to visit
statements in the right order without having to explicitly construct the
dominance tree.

>
> Also, note that the unnecessary phi nodes will not always have all sources
> the same.  Consider this pseudocode:
>
> int x;
> x = 1;
> loop {
>    if (...) {
>       x = x + 1;
>       break;
>    } else if (...) {
>       x = x + 2;
>       break;
>    }
> }
> foo(x);
>
> If I'm not mistaken, your algorithm will convert this to the following SSA:
>
> int x_1 = 1;
> loop {
>    int x_2 = phi(x_1, x_2); // Unnecessary phi node
>    if (...) {
>       int x_3 = x_2 + 1;
>       break;
>    } else if (...) {
>       int x_4 = x_2 + 2;
>       break;
>    }
> }
> int x_4 = phi(x_3, x_4);
> foo(x_4);
>
>
> So, removing the unnecessary phi nodes in a later optimization pass won't
> be quite as trivial as looking for phi nodes where all sources are the
> same; we'll also have to detect phi nodes where all sources are equivalent.
>
> Having said that, I still think your approximation is reasonable.
>

Ah, thanks for pointing that out.

>
>>
>> +class ir_ssa_variable_state
>> +{
>> +public:
>> +   ir_ssa_variable_state(ir_variable *var, ir_ssa_state_visitor *v,
>> +                        ir_variable *undefined_var);
>> +   ~ir_ssa_variable_state();
>> +
>> +   ir_variable *var; /* the original variable. */
>>
>
> Let's use Javadoc format for this comment and the ones below (i.e. "/**<
> the original variable. */")
>
>
>> +
>> +   void stack_push(ir_variable *new_var);
>> +
>> +   void stack_pop();
>> +
>> +   ir_variable *cur_var(bool);
>>
>
> It would be really helpful if this said
>
>    ir_variable *cur_var(bool use_undefined_var);
>
> so that it's not necessary to cross reference to the class definition to
> see what the bool argument is for.
>
>
>> +
>> +   ir_variable *new_var();
>> +
>> +   ir_variable **stack; /* The stack of replacements for the variable. */
>> +   int num_replaced;
>> +   int num_defs; /* the number of times the variable is assigned */
>> +   int stack_idx; /* the current index into the stack */
>> +
>> +   ir_ssa_state_visitor *v;
>> +
>> +   ir_variable *undefined_var; /** < used for when var is read before
>> written. */
>> +};
>> +
>> +
>> +ir_variable *
>> +ir_ssa_variable_state::cur_var(bool use_undefined_var)
>>
>
> It would be nice to have some comment text above this function explaining
> when and why it makes sense to pass true for use_undefined_var.  My
> understanding (and please correct me if I'm wrong) is that we set
> use_undefined_var = false when we're populating a phi node (because phi
> nodes understand null to mean "variable is uninitialized along this
> execution path"), but we set use_undefined_var = true when we're populating
> an ir_dereference_variable (because ir_dereference_variable doesn't handle
> having a null var field).
>

That's correct.

>
>
>> +
>> +void
>> +ir_ssa_variable_state::stack_push(ir_variable *new_var)
>> +{
>> +   this->stack_idx++;
>> +   this->num_replaced++;
>> +   assert(this->num_replaced <= this->num_defs);
>>
>
> Can we also add the assertion:
>
>    assert(this->stack_idx < this->num_defs);
>
> so that we'll be confident that the code below doesn't overflow the
> stack?  I realize that you've arranged things so that stack_push() is never
> called more than num_defs times, but that's not obvious without
> understanding the whole algorithm.
>
>
>> +   this->stack[this->stack_idx] = new_var;
>> +   _mesa_hash_table_insert(this->v->new_to_old,
>> _mesa_hash_pointer(new_var),
>> +                          new_var, this->var);
>> +}
>> +
>> +
>> +ir_variable *
>> +ir_ssa_variable_state::new_var()
>> +{
>> +   void *mem_ctx = ralloc_parent(this->var);
>> +   char *new_name = ralloc_asprintf(mem_ctx, "%s_%i", this->var->name,
>> +                                   this->num_replaced);
>>
>
> Clearly the intention here is for this to generate a unique name each time
> it is called.  But to verify that it does that, you need to notice that
> this->num_replaced is incremented by stack_push(), and that stack_push() is
> always called by this function.
>
> How about if instead we moved the lines:
>
>    this->num_replaced++;
>
>    assert(this->num_replaced <= this->num_defs);
>
> from stack_push() to new_var()?
>
>
>> +   ir_variable *new_var = new(mem_ctx) ir_variable(this->var->type,
>> new_name,
>> +                                                  ir_var_temporary_ssa);
>> +   this->stack_push(new_var);
>> +   return new_var;
>> +}
>> +
>> +ir_ssa_variable_state::ir_ssa_variable_state(ir_variable* var,
>> +                                            ir_ssa_state_visitor *v,
>> +                                            ir_variable *undefined_var)
>> +   : var(var), v(v), undefined_var(undefined_var)
>> +{
>> +   this->stack = NULL;
>> +   this->num_replaced = 0;
>> +   this->num_defs = 0;
>> +   this->stack_idx = -1;
>> +}
>>
>
> It seems odd to initialize some variables in the initializer list and
> others in the body of the constructor.  Can we intialize them all in the
> initializer list?
>
>
>> +
>> +
>> +ir_visitor_status
>> +ir_ssa_state_visitor::visit(ir_variable *var)
>> +{
>> +   if (var->data.mode == ir_var_auto || var->data.mode ==
>> ir_var_temporary) {
>> +      void *mem_ctx = ralloc_parent(var);
>> +      ir_assignment *assign = ssa_assign("undefined",
>> +                                        ir_constant::zero(mem_ctx,
>> var->type));
>>
>
> Previous to this patch, if a GLSL program that read from a variable before
> writing to it, it would get an undefined result (it would read garbage from
> a register that was previously used for another purpose, possibly even by a
> previous shader invocation).  It looks like your SSA transformation changes
> that, so that all uninitialized variables now read as zero.
>
> There are a few reasons why a shader might legitimately contain a read
> from an uninitialized variable.  For example, consider a vertex shader that
> only populates one of its outputs a certain uniform boolean is true, linked
> with a fragment shader that only examines the corresponding input when the
> same boolean is true.  Currently, Mesa will compile the vertex shader in
> such a way that when the uniform is false, the vertex shader output
> contains garbage data.  With your SSA conversion, the vertex shader will
> need an extra instruction to initialize the output to zero when the uniform
> is false.  I'm not sure if this will ever make a big enough difference for
> us to care; if it does, there are some tricks we could play to avoid the
> unnecessary initialization (for example, we could add a new type of ir
> node, ir_undefined, which evaluates to garbage).  It would be nice to add a
> comment somewhere explaining this situation so that if we later decide it's
> worth improving, we'll have an idea where to start.
>

Sure... the whole undefined_var thing is really a hack to keep things from
going haywire when we hit these sorts of situations; we really should have
a defined way to say "this instruction reads from an undefined value"
similar to what we do for phi nodes. I think we can then optimize away
expressions that have undefined values in them by just making their result
undefined (unlike for phi nodes); for now, this will do though.

>
>
>> +      ir_variable *undefined_var =
>> assign->lhs->as_dereference_variable()->var;
>> +      var->insert_after(assign);
>> +      ir_ssa_variable_state *entry = new ir_ssa_variable_state(var, this,
>> +
>>  undefined_var);
>> +      _mesa_hash_table_insert(this->ht, _mesa_hash_pointer(var), var,
>> entry);
>> +   }
>> +
>> +   return visit_continue;
>> +}
>> +
>> +namespace {
>> +
>> +/*
>> + * Rewrites out and inout parameters of functions to use a separate
>> temporary.
>> + * For example if we have:
>> + *
>> + * void foo(out vec4 arg1, inout vec4 arg2);
>> + *
>> + * and it gets called like:
>> + *
>> + * foo(bar, baz);
>> + *
>> + * Then assuming bar and baz are local variables to be transformed into
>> SSA, it
>> + * will be rewritten as
>> + *
>> + * vec4 tmp1, tmp2 = baz;
>> + * foo(tmp1, tmp2);
>> + * bar = tmp1;
>> + * baz = tmp2;
>> + *
>> + * This captures the correct semantics of the original while still
>> allowing us
>> + * to convert bar and baz to SSA variables; in effect, this limits the
>> + * "non-SSA-ness" to those four statements, hopefully allowing more
>> + * optimizations to occur than if we simply prevented bar and baz from
>> being
>> + * transformed into SSA form.
>> + */
>>
>
> How well does this transformation handle abominations like the following?
>
>    int i = 0;
>    int a[2] = { 0, 0 };
>    foo(out i, out a[i]);
>
> (Note: I'm not sure Mesa, or any other implementation, gets the semantics
> of this sort of thing right in all circumstances, so if your algorithm
> doesn't either, that's not necessarily a big deal.  We should probably just
> make a note of it with a "FIXME" comment so that if there's trouble in the
> future we'll know what we need to work on).
>

Section 6.1.1 "Function Calling Conventions" of the GLSL 4.30 spec says
"The order in which output parameters are copied back to the caller is
undefined," so I believe what that code does is undefined; GLSL 1.10 has
the same language, so I assume it's always been that way.

(P.S. if you write that in your shader, you're evil and I hate you.)

>
>
>> +class ir_phi_insertion_visitor : public ir_hierarchical_visitor
>> +{
>> +public:
>> +   ir_phi_insertion_visitor(ir_ssa_state_visitor *ssv,
>> +                           ir_dead_branches_visitor *dbv,
>> +                           ir_loop_jumps_visitor *ljv)
>> +      : ssv(ssv), dbv(dbv), ljv(ljv)
>> +   {
>> +   }
>> +
>> +   virtual ir_visitor_status visit_enter(ir_if *);
>> +   virtual ir_visitor_status visit_enter(ir_loop *);
>> +   virtual ir_visitor_status visit(ir_dereference_variable *);
>> +
>> +private:
>> +   void add_phi(ir_if *ir, ir_variable *var);
>> +   void add_phi(ir_loop *loop, ir_variable *var);
>> +
>> +   ir_ssa_state_visitor *ssv;
>> +   ir_dead_branches_visitor *dbv;
>> +   ir_loop_jumps_visitor *ljv;
>> +
>> +   exec_list cf_stack;
>>
>
> This field needs a comment above it saying what type of object
> (ir_control_flow_entry-derived objects I believe) is contained in the list.
>
>
>> +};
>> +
>> +}; /* private namespace */
>> +
>> +ir_visitor_status
>> +ir_phi_insertion_visitor::visit_enter(ir_if *ir)
>> +{
>> +   //before doing anything, visit the condition, since it's really part
>> of
>> +   //the block before this
>> +   ir->condition->accept(this);
>> +
>> +   ir_if_entry entry(ir);
>> +
>> +   this->cf_stack.push_head(&entry);
>> +   entry.in_then = true;
>> +   visit_list_elements(this, &ir->then_instructions);
>> +   entry.in_then = false;
>> +   visit_list_elements(this, &ir->else_instructions);
>> +   this->cf_stack.pop_head();
>>
>
> Since the ir_if_entry is located on the stack, all hell will break loose
> if the calls to visit_list_elements() don't pop the same number of stack
> entries as they push.  Can we add an assertion just before the call to
> pop_head() to double-check that the thing we're about to pop is the same
> node that we pushed?
>
> Alternatively, since we only ever need to traverse the stack in one
> direction, we could just use a singly linked list instead of an exec_list
> (i.e. each ir_control_flow_entry contains an "ir_control_flow_entry *next"
> pointer, and then here in this function we can do this:
>
> ir_if_entry entry(this->cf_head, ir);
> this->cf_head = &entry; /* Push onto CF stack */
> entry.in_then = true;
> visit_list_elements(this, &ir->then_instructions);
> entry.in_then = false;
> visit_list_elements(this, &ir->else_instructions);
> this->cs_head = entry.next; /* pop CF stack */
>
> Similar comments apply to ir_phi_insertion_visitor::visit_enter(ir_loop
> *ir).
>
>
>> +
>> +   return visit_continue_with_parent;
>> +}
>> +
>> +
>> +ir_visitor_status
>> +ir_phi_insertion_visitor::visit(ir_dereference_variable *ir)
>> +{
>> +   if (!this->in_assignee || this->cf_stack.is_empty()
>> +       || !this->ssv->get_state(ir->var))
>> +      return visit_continue;
>> +
>> +   exec_node *cur_node = this->cf_stack.head;
>> +
>> +   ir_control_flow_entry *cf_entry = (ir_control_flow_entry *) cur_node;
>> +   ir_if_entry *if_entry = cf_entry->as_if_entry();
>> +   if (if_entry) {
>> +      ir_dead_branches *db = this->dbv->get_dead_branches(if_entry->ir);
>> +      if ((db->then_dead && if_entry->in_then) ||
>> +         (db->else_dead && !if_entry->in_then)) {
>>
>
> The intent might be clearer as:
>
>    if (if_entry->in_then ? db->then_dead : db->else_dead)
>
>
>> +        if ((db->then_dead_return && if_entry->in_then) ||
>> +            (db->else_dead_return && !if_entry->in_then)) {
>>
>
> Similar comment here.
>
>
>> +           //The branch we're on leads to a return or discard, so the
>> +           //assignment can't lead to any join nodes
>>
>
> Style nit-pick: Mesa uses C-style comments.
>
>
>> +           return visit_continue;
>> +        }
>> +
>> +        /*
>> +         * The branch we're on leads to a break or continue.
>> +         * We may need a phi node at the beginning, end, or both,
>> depending
>> +         * on if we exit through a continue, break, or both,
>> repsectively.
>> +         * We use an approximation here, and simply add a phi node to the
>> +         * beginning and end. The worst thing that can happen is that we
>> wind
>> +         * up with a phi node where all the sources are the same, which
>> can
>> +         * be easily optimized away in a later pass.
>> +         */
>>
>
> As with the approximation you mention near the top of the file, on my
> first several readings, I failed to understand which parts of the code make
> this approximation and which don't.  I now see that the reason this
> approximation is safe is because at this point we are only determining
> where the phi nodes need to be inserted.  Later, when
> ir_phi_insertion_visitor::add_phi() actually inserts the phi nodes, it will
> do the right thing (it will only add continue sources to the
> ir_phi_loop_begin and it will only add break sources to the
> ir_phi_loop_end).  As a result, the worst that could happen is that
> pseudocode like this:
>
> int x;
> x = 1;
> loop {
>    if (...) {
>       x = x + 1;
>       break;
>    } else {
>       x = x + 2;
>    }
> }
> foo(x);
>
> would get transformed to this:
>
> int x_1 = 1;
> loop {
>    int x_2 = phi(x_1, x_4);
>    if (...) {
>       int x_3 = x_2 + 1;
>       break;
>    } else {
>       int x_4 = x_2 + 2;
>    }
>  }
> int x_5 = phi(x_3); // unnecessary phi node
>
> It would be helpful to include an example like this in the comment to
> clarify.
>
>
>> +
>> +        do {
>> +           cur_node = cur_node->next;
>> +           cf_entry = (ir_control_flow_entry *) cur_node;
>> +        } while (!cf_entry->as_loop_entry());
>>
>
> At first glance it looks like this might walk off the beginning of the
> list since we don't check cur_node->is_head_sentinel().  It would be nice
> to add a comment explaining that this can't happen, because the only way we
> can reach this code is if one of the branches in the if ended in a break or
> continue; hence we are in a loop.
>
>
>> +      }
>> +   }
>> +
>> +   /*
>> +    * walk the stack of control flow elements, placing phi nodes as
>> +    * necessary
>> +    */
>> +
>> +   for (; cur_node->next != NULL; cur_node = cur_node->next) {
>>
>
> I don't think a simple loop suffices here.  Consider the pseudocode:
>
> loop {
>    if (...) {
>       loop {
>          if (...) {
>             x = 1;
>             break;
>          }
>          // A
>       }
>       // B
>       break;
>    }
>    // C
> }
> // D
>
> Let's say we're visiting the dereference for the "x" in "x = 1".  We
> should add phi nodes at locations B and D, but not at A or C.
>
> The "if (if_entry)" block above will take care of making sure we don't add
> a phi node at A, but then we'll get to this loop and we'll add phi nodes at
> locations B, C, and D.
>
> I believe similar problems will arise if the "if" we're analyzing is
> nested inside another "if" without an intervening loop.
>
>
> I think the right way to deal with this is to fuse this loop with the
> previous special handling of "if" so that it looks something like this:
>
>
>    exec_node *cur_node = this->cf_stack.head;
>    for (exec_node *cur_node = this->cf_stack.head;
>         !cur_node->is_head_sentinel(); cur_node = cur_node->prev) {
>
>       ir_control_flow_entry *cf_entry = (ir_control_flow_entry *) cur_node;
>       if (ir_if_entry *if_entry = cur_node->as_if_entry()) {
>
>          ir_dead_branches *db = this->dbv->get_dead_branches(if_entry->ir);
>          if ((db->then_dead && if_entry->in_then) ||
>              (db->else_dead && !if_entry->in_then)) {
>             if ((db->then_dead_return && if_entry->in_then) ||
>                 (db->else_dead_return && !if_entry->in_then)) {
>                /* The branch we're on leads to a return or discard... */
>                return visit_continue;
>             }
>             /* Find the innermost nested loop */
>             do {
>                cur_node = cur_node->next;
>
>                cf_entry = (ir_control_flow_entry *) cur_node;
>             } while (!cf_entry->as_loop_entry());
>             /* Fall through to handle the loop entry */
>          } else {
>
>             this->add_phi(if_entry->ir, ir->var);
>             /* Move on to the next enclosing if or loop */
>             continue;
>          }
>       }
>
>       /* Handle the loop entry. */
>       ir_loop_entry *loop_entry = cur_node->as_loop_entry();
>
>       this->add_phi(loop_entry->loop, ir->var);
>    }
>

Nice... much cleaner than what I have, and fixes a bug as well :)

>
>
>
>
>> +      cf_entry = (ir_control_flow_entry *) cur_node;
>> +
>> +      if_entry = cf_entry->as_if_entry();
>> +      if (if_entry) {
>> +        this->add_phi(if_entry->ir, ir->var);
>> +      } else {
>> +        ir_loop_entry *loop_entry = cf_entry->as_loop_entry();
>> +        this->add_phi(loop_entry->loop, ir->var);
>> +      }
>> +   }
>> +
>> +   return visit_continue;
>> +}
>> +
>> +void
>> +ir_phi_insertion_visitor::add_phi(ir_if *ir, ir_variable *var)
>> +{
>> +   void *mem_ctx = ralloc_parent(ir);
>> +
>> +   //Don't duplicate phi nodes
>> +   if (phi_exists(ir->phi_nodes, var)) {
>> +      return;
>> +   }
>> +
>> +   ir_phi_if *phi = new (mem_ctx) ir_phi_if(var, var, var);
>>
>
> It would be nice to have a short comment explaining that the reason we
> pass the same var to ir_phi_if three times is because in this phase of the
> algorithm, we are just creating a trivial phi node.  Later,
> ir_rewrite_visitor will replace each variable with the appropriate SSA
> temporary.
>
>
>> +   ir->phi_nodes.push_tail(phi);
>> +
>> +   ir_ssa_variable_state *isvs = this->ssv->get_state(var);
>> +   isvs->num_defs++;
>> +}
>> +
>> +void
>> +ir_phi_insertion_visitor::add_phi(ir_loop *loop, ir_variable *var)
>> +{
>> +   void *mem_ctx = ralloc_parent(loop);
>> +
>> +   //Don't duplicate phi nodes
>> +   if (phi_exists(loop->begin_phi_nodes, var)) {
>> +      return;
>> +   }
>> +
>> +   ir_loop_jumps *loop_jumps = this->ljv->get_loop_jumps(loop);
>> +
>> +   ir_phi_loop_begin *phi_begin = new(mem_ctx) ir_phi_loop_begin(var,
>> var, var);
>>
>
> A similar comment applies here.
>
>
>> +
>> +   foreach_list(n, &loop_jumps->continues) {
>> +      ir_loop_jump_entry *entry = (ir_loop_jump_entry *) n;
>> +
>> +      ir_phi_jump_src *src = new(mem_ctx) ir_phi_jump_src();
>> +      src->jump = entry->ir;
>> +      src->src = var;
>> +
>> +      phi_begin->continue_srcs.push_tail(src);
>> +   }
>> +
>> +   loop->begin_phi_nodes.push_tail(phi_begin);
>> +
>> +   ir_phi_loop_end *phi_end = new(mem_ctx) ir_phi_loop_end(var);
>> +
>> +   foreach_list(n, &loop_jumps->breaks) {
>> +      ir_loop_jump_entry *entry = (ir_loop_jump_entry *) n;
>> +
>> +      ir_phi_jump_src *src = new(mem_ctx) ir_phi_jump_src();
>> +      src->jump = entry->ir;
>> +      src->src = var;
>> +
>> +      phi_end->break_srcs.push_tail(src);
>> +   }
>> +
>> +   loop->end_phi_nodes.push_tail(phi_end);
>> +
>> +   ir_ssa_variable_state *isvs = this->ssv->get_state(var);
>> +   isvs->num_defs += 2;
>> +}
>> +
>> +ir_visitor_status
>> +ir_rewrite_backward_visitor::visit(ir_dereference_variable *ir)
>>
>
> One of the things that took me a long time to understand about this
> algorithm is that the sole purpose of visiting the nodes backwards is to
> undo the effects of the forwards visit on the stacks maintained by the
> ir_ssa_variable_state objects.  To help clarify this, it would be helpful
> if each of the backward visitor methods contained a short comment referring
> to the corresponding forward visitor operation which is being undone.  For
> example, this method could contain a comment explaining that since the
> forward visitors for ir_assignment and ir_call do
> ir_ssa_variable_state::new_var(), we need to call stack_pop() to undo the
> effect.
>
> You might also consider getting rid of this method, and instead doing the
> work in ir_rewrite_backward_visitor::visit(ir_call *) and
> ir_rewrite_backward_visitor::viist(ir_assignment *).  It's a little more
> code, but the advantage is that it makes the relationship between the
> forward and backward visitors much easier to follow.
>
>
>> +namespace {
>> +
>> +class ir_rewrite_visitor {
>> +public:
>> +   ir_rewrite_visitor(ir_ssa_state_visitor *, ir_dead_branches_visitor
>> *);
>> +
>> +   void rewrite(exec_list *instructions);
>> +
>> +private:
>> +   void rewrite_forwards(exec_list *instructions);
>> +   void rewrite_backwards(exec_list *instructions);
>> +
>> +   void rewrite(ir_if *);
>> +   void rewrite(ir_loop *);
>> +   void rewrite(ir_loop_jump *);
>>
>
> For a while I was confused about why these three methods are called
> rewrite(), but they are called from rewrite_forwards(exec_list *).  I'd
> recommend renaming them to rewrite_forwards().
>
>
>> +
>> +   void rewrite_backwards(ir_if *);
>> +   void rewrite_backwards(ir_loop *);
>> +
>> +   void rewrite_phi_dest(ir_phi *);
>> +
>> +   ir_ssa_state_visitor *ssv;
>> +   ir_dead_branches_visitor *dbv;
>> +   ir_rewrite_forward_visitor rfv;
>> +   ir_rewrite_backward_visitor rbv;
>> +
>> +   ir_loop *outer_loop;
>> +};
>> +
>> +}; /* private namespace */
>> +
>> +void
>> +ir_rewrite_visitor::rewrite(ir_if *ir)
>> +{
>> +   ir->condition->accept(&this->rfv);
>> +
>> +   ir_dead_branches *dead_branches = this->dbv->get_dead_branches(ir);
>> +   if (dead_branches->then_dead) {
>> +      if (dead_branches->else_dead) {
>> +        this->rewrite(&ir->then_instructions);
>> +        this->rewrite(&ir->else_instructions);
>> +      } else {
>> +        this->rewrite(&ir->then_instructions);
>> +        this->rewrite_forwards(&ir->else_instructions);
>> +      }
>> +   } else if (dead_branches->else_dead) {
>> +      this->rewrite(&ir->else_instructions);
>> +      this->rewrite_forwards(&ir->then_instructions);
>>
>
> It would be nice to have some comments in the three cases above to explain
> the rationale for when we call rewrite() vs rewrite_forwards().
>
>
>> +   } else {
>> +      this->rewrite_forwards(&ir->then_instructions);
>> +      foreach_list(n, &ir->phi_nodes) {
>> +        ir_phi_if *phi = (ir_phi_if *) n;
>> +        ir_ssa_variable_state *isvs;
>> +
>> +        isvs = this->ssv->get_state(phi->if_src);
>> +        phi->if_src = isvs->cur_var(false);
>> +      }
>> +      this->rewrite_backwards(&ir->then_instructions);
>> +
>> +      this->rewrite_forwards(&ir->else_instructions);
>> +      foreach_list(n, &ir->phi_nodes) {
>> +        ir_phi_if *phi = (ir_phi_if *) n;
>> +        ir_ssa_variable_state *isvs;
>> +
>> +        isvs = this->ssv->get_state(phi->else_src);
>> +        phi->else_src = isvs->cur_var(false);
>> +      }
>> +      this->rewrite_backwards(&ir->else_instructions);
>> +
>> +      foreach_list(n, &ir->phi_nodes) {
>> +        ir_phi_if *phi = (ir_phi_if *) n;
>> +
>> +        this->rewrite_phi_dest(phi);
>> +      }
>> +   }
>> +}
>> +
>> +void
>> +ir_rewrite_visitor::rewrite_backwards(ir_if *ir)
>>
>
> As with ir_rewrite_backward_visitor::visit(ir_dereference_variable *ir),
> it would be nice to have some comments in this function explaining which
> forward visitor operations we're un-doing.  A similar comment applies to
> ir_rewrite_visitor::rewrite_backwards(ir_loop *).
>
>
> Once again, thanks for all the work you've put into this patch.  SSA is
> something Mesa has been sorely needing for a long time, and I really like
> the approach you're taking.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140205/8473e6f3/attachment-0001.html>