[Mesa-dev] [PATCH RFC 09/11] glsl: add pass to convert GLSL IR to SSA form

Thu Jan 30 19:30:28 PST 2014

On 22 January 2014 09:16, Connor Abbott <cwabbott0 at gmail.com> wrote:

> opt_to_ssa will convert temporaries and local variables to SSA form,
> although for now it can't handle array and record dereferences.
> ---
>  src/glsl/Makefile.sources  |    1 +
>  src/glsl/ir_optimization.h |    2 +
>  src/glsl/opt_to_ssa.cpp    | 1155
> ++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1158 insertions(+)
>  create mode 100644 src/glsl/opt_to_ssa.cpp
>

General comment: this is a long and tricky algorithm.  In addition to the
overview information you provided at the top of opt_to_ssa.cpp (which is a
great help, BTW), it would be really nice to have a summary that describes
the phases of the algorithm and how they relate to each other.  I'd
recommend moving the class definitions to a header file; this would allow
convert_to_ssa_function() to be moved to the top of opt_to_ssa.cpp, and
then you could annotate that function with comments explaining what each
phase of the algorithm does.

You might also consider creating separate cpp/h files (and separate
patches) for:

- ir_ssa_state_visitor (along with the associated class
ir_ssa_variable_state)
- ir_parameter_visitor
- ir_phi_insertion_visitor (along with the associated class
ir_control_flow_entry and derived classes)
- ir_rewrite_visitor (along with the associated classes
ir_rewrite_forward_visitor and ir_rewrite_backward_visitor)

(kind of like you already did with ir_dead_branches_visitor and
ir_loop_jumps_visitor).  That would make future reviews go a lot more
smoothly, because it would avoid having a single large patch, and the way
the various phases of the algorithm fit together would be clear from the
sequence of commit messages.

FWIW here are the notes I made about the phases of your algorithm.  Feel
free to use my words in your comments if they're helpful:

1. Run ir_dead_branches_visitor: determine control flow out of if's.
2. Run ir_loop_jumps_visitor: find set of breaks and continues from each
loop.
3. Run ir_ssa_state_visitor: create ir_ssa_variable_state object for each
SSA-able var in the original prog, count number of times each SSA-able var
is assigned to.
4. Run ir_parameter_visitor: convert out and inout parameters of calls to a
form that is easier to convert to SSA.
5. Run ir_phi_insertion_visitor: insert so-called "trivial" phi functions
(phi functions of the form V = phi(V, V, ...)).  Note that since phi nodes
count as assignments, this visitor needs to update the counts produced by
ir_ssa_state_visitor.
6. Call ssv.allocate_state_arrays(): allocate a big enough stack of
ir_variable *'s inside each ir_ssa_variable_state to hold one pointer for
each assignment to the variable (including assignments by phi nodes).
7. Run ir_rewrite_visitor: replace each SSA-able variable V with an SSA
variable Vi such that no Vi is assigned more than once.  This includes
rewriting the trivial phi functions so that they are no longer trivial.
8. Call ssv.remove_decls(): remove the declarations of the old variables V
now that they have been replaced by Vi's.

A few other things I stumbled on that could really use clarification
include:

- It took me a lot of reading to understand why ir_rewrite_visitor visits
nodes in both forward and reverse order.  I now understand that what's
happening is that we're maintaining a stack of SSA temporaries
corresponding to each original SSA-able variable; when we visit the code in
the forward direction we push temporaries onto the stack; when we visit the
code in reverse, we pop them off again.  The order of visiting flow control
has been carefully chosen so that at any time, the SSA temporary that's at
the top of the stack is the one holding the value of the variable that is
"current" from the point of view of the visitor's current location.  It
would be helpful to have this information in a comment, perhaps with a few
pseudocode examples to show the order in which nodes are visited.

- The paper mentions that there is an option of producing either a pruned
SSA form or a non-pruned SSA form (the difference is that in non-pruned SSA
form, a phi function is inserted wherever two control paths making
different assignments to a variable converge; in pruned SSA form phi nodes
are pruned out if they would never be referenced).  It looks like you chose
to produce non-pruned SSA.  Can you comment on your reasoning for doing so?

- Why don't we try to convert arrays and structures to SSA?  The paper
suggests that A[i] = x could be replaced by A = Update(A, i, x), but it
looks like you've chosen not to do that.  Is that because there's some
subtle reason why it doesn't work for GLSL, or is it just because you
haven't had a chance to make this improvement?

Thanks so much, Connor.  You've obviously done a lot of work on this and I
hope that the volume of my comments doesn't dishearten you; I'm really
excited to see SSA support added to Mesa, and I think with a little
clean-up this ought to be ready to merge in pretty soon!

More comments below:

>
> + *
> + * - The instruction before the loop dominates the instructions inside
> the loop
> + * as well as the instructions after the loop. Here is where the
> approximation
> + * lies: really, since the loop is guarenteed to execute at least once,
> the
>

"guaranteed"

> + * instructions after the loop can potentially be dominated by an
> instruction
> + * inside the loop. Computing that instruction, though, would be
> complicated,
> + * and in the end it doesn't hurt much if we ignore that detail. In the
> end, we
> + * may have some phi nodes where all the sources are the same, but these
> can
> + * easily be optimized away.
>

On my first several readings, I misunderstood this comment to mean "we
convert to SSA form assuming that each loop might execute zero times".  Now
that I've read through the full algorithm, I understand that this
approximation only applies during the ir_phi_insertion_visitor part of the
algorithm.  The rest of the algorithm doesn't make this approximation, and
that's why the only consequence of the approximation is that we may have
some unnecessary phi nodes.  It would be nice to add some text to clarify
this.

Also, note that the unnecessary phi nodes will not always have all sources
the same.  Consider this pseudocode:

int x;
x = 1;
loop {
   if (...) {
      x = x + 1;
      break;
   } else if (...) {
      x = x + 2;
      break;
   }
}
foo(x);

If I'm not mistaken, your algorithm will convert this to the following SSA:

int x_1 = 1;
loop {
   int x_2 = phi(x_1, x_2); // Unnecessary phi node
   if (...) {
      int x_3 = x_2 + 1;
      break;
   } else if (...) {
      int x_4 = x_2 + 2;
      break;
   }
}
int x_4 = phi(x_3, x_4);
foo(x_4);

So, removing the unnecessary phi nodes in a later optimization pass won't
be quite as trivial as looking for phi nodes where all sources are the
same; we'll also have to detect phi nodes where all sources are equivalent.

Having said that, I still think your approximation is reasonable.

>
> +class ir_ssa_variable_state
> +{
> +public:
> +   ir_ssa_variable_state(ir_variable *var, ir_ssa_state_visitor *v,
> +                        ir_variable *undefined_var);
> +   ~ir_ssa_variable_state();
> +
> +   ir_variable *var; /* the original variable. */
>

Let's use Javadoc format for this comment and the ones below (i.e. "/**<
the original variable. */")

> +
> +   void stack_push(ir_variable *new_var);
> +
> +   void stack_pop();
> +
> +   ir_variable *cur_var(bool);
>

It would be really helpful if this said

   ir_variable *cur_var(bool use_undefined_var);

so that it's not necessary to cross reference to the class definition to
see what the bool argument is for.

> +
> +   ir_variable *new_var();
> +
> +   ir_variable **stack; /* The stack of replacements for the variable. */
> +   int num_replaced;
> +   int num_defs; /* the number of times the variable is assigned */
> +   int stack_idx; /* the current index into the stack */
> +
> +   ir_ssa_state_visitor *v;
> +
> +   ir_variable *undefined_var; /** < used for when var is read before
> written. */
> +};
> +
> +
> +ir_variable *
> +ir_ssa_variable_state::cur_var(bool use_undefined_var)
>

It would be nice to have some comment text above this function explaining
when and why it makes sense to pass true for use_undefined_var.  My
understanding (and please correct me if I'm wrong) is that we set
use_undefined_var = false when we're populating a phi node (because phi
nodes understand null to mean "variable is uninitialized along this
execution path"), but we set use_undefined_var = true when we're populating
an ir_dereference_variable (because ir_dereference_variable doesn't handle
having a null var field).

> +
> +void
> +ir_ssa_variable_state::stack_push(ir_variable *new_var)
> +{
> +   this->stack_idx++;
> +   this->num_replaced++;
> +   assert(this->num_replaced <= this->num_defs);
>

Can we also add the assertion:

   assert(this->stack_idx < this->num_defs);

so that we'll be confident that the code below doesn't overflow the stack?
I realize that you've arranged things so that stack_push() is never called
more than num_defs times, but that's not obvious without understanding the
whole algorithm.

> +   this->stack[this->stack_idx] = new_var;
> +   _mesa_hash_table_insert(this->v->new_to_old,
> _mesa_hash_pointer(new_var),
> +                          new_var, this->var);
> +}
> +
> +
> +ir_variable *
> +ir_ssa_variable_state::new_var()
> +{
> +   void *mem_ctx = ralloc_parent(this->var);
> +   char *new_name = ralloc_asprintf(mem_ctx, "%s_%i", this->var->name,
> +                                   this->num_replaced);
>

Clearly the intention here is for this to generate a unique name each time
it is called.  But to verify that it does that, you need to notice that
this->num_replaced is incremented by stack_push(), and that stack_push() is
always called by this function.

How about if instead we moved the lines:

   this->num_replaced++;
   assert(this->num_replaced <= this->num_defs);

from stack_push() to new_var()?

> +   ir_variable *new_var = new(mem_ctx) ir_variable(this->var->type,
> new_name,
> +                                                  ir_var_temporary_ssa);
> +   this->stack_push(new_var);
> +   return new_var;
> +}
> +
> +ir_ssa_variable_state::ir_ssa_variable_state(ir_variable* var,
> +                                            ir_ssa_state_visitor *v,
> +                                            ir_variable *undefined_var)
> +   : var(var), v(v), undefined_var(undefined_var)
> +{
> +   this->stack = NULL;
> +   this->num_replaced = 0;
> +   this->num_defs = 0;
> +   this->stack_idx = -1;
> +}
>

It seems odd to initialize some variables in the initializer list and
others in the body of the constructor.  Can we intialize them all in the
initializer list?

> +
> +
> +ir_visitor_status
> +ir_ssa_state_visitor::visit(ir_variable *var)
> +{
> +   if (var->data.mode == ir_var_auto || var->data.mode ==
> ir_var_temporary) {
> +      void *mem_ctx = ralloc_parent(var);
> +      ir_assignment *assign = ssa_assign("undefined",
> +                                        ir_constant::zero(mem_ctx,
> var->type));
>

Previous to this patch, if a GLSL program that read from a variable before
writing to it, it would get an undefined result (it would read garbage from
a register that was previously used for another purpose, possibly even by a
previous shader invocation).  It looks like your SSA transformation changes
that, so that all uninitialized variables now read as zero.

There are a few reasons why a shader might legitimately contain a read from
an uninitialized variable.  For example, consider a vertex shader that only
populates one of its outputs a certain uniform boolean is true, linked with
a fragment shader that only examines the corresponding input when the same
boolean is true.  Currently, Mesa will compile the vertex shader in such a
way that when the uniform is false, the vertex shader output contains
garbage data.  With your SSA conversion, the vertex shader will need an
extra instruction to initialize the output to zero when the uniform is
false.  I'm not sure if this will ever make a big enough difference for us
to care; if it does, there are some tricks we could play to avoid the
unnecessary initialization (for example, we could add a new type of ir
node, ir_undefined, which evaluates to garbage).  It would be nice to add a
comment somewhere explaining this situation so that if we later decide it's
worth improving, we'll have an idea where to start.

> +      ir_variable *undefined_var =
> assign->lhs->as_dereference_variable()->var;
> +      var->insert_after(assign);
> +      ir_ssa_variable_state *entry = new ir_ssa_variable_state(var, this,
> +
>  undefined_var);
> +      _mesa_hash_table_insert(this->ht, _mesa_hash_pointer(var), var,
> entry);
> +   }
> +
> +   return visit_continue;
> +}
> +
> +namespace {
> +
> +/*
> + * Rewrites out and inout parameters of functions to use a separate
> temporary.
> + * For example if we have:
> + *
> + * void foo(out vec4 arg1, inout vec4 arg2);
> + *
> + * and it gets called like:
> + *
> + * foo(bar, baz);
> + *
> + * Then assuming bar and baz are local variables to be transformed into
> SSA, it
> + * will be rewritten as
> + *
> + * vec4 tmp1, tmp2 = baz;
> + * foo(tmp1, tmp2);
> + * bar = tmp1;
> + * baz = tmp2;
> + *
> + * This captures the correct semantics of the original while still
> allowing us
> + * to convert bar and baz to SSA variables; in effect, this limits the
> + * "non-SSA-ness" to those four statements, hopefully allowing more
> + * optimizations to occur than if we simply prevented bar and baz from
> being
> + * transformed into SSA form.
> + */
>

How well does this transformation handle abominations like the following?

   int i = 0;
   int a[2] = { 0, 0 };
   foo(out i, out a[i]);

(Note: I'm not sure Mesa, or any other implementation, gets the semantics
of this sort of thing right in all circumstances, so if your algorithm
doesn't either, that's not necessarily a big deal.  We should probably just
make a note of it with a "FIXME" comment so that if there's trouble in the
future we'll know what we need to work on).

> +class ir_phi_insertion_visitor : public ir_hierarchical_visitor
> +{
> +public:
> +   ir_phi_insertion_visitor(ir_ssa_state_visitor *ssv,
> +                           ir_dead_branches_visitor *dbv,
> +                           ir_loop_jumps_visitor *ljv)
> +      : ssv(ssv), dbv(dbv), ljv(ljv)
> +   {
> +   }
> +
> +   virtual ir_visitor_status visit_enter(ir_if *);
> +   virtual ir_visitor_status visit_enter(ir_loop *);
> +   virtual ir_visitor_status visit(ir_dereference_variable *);
> +
> +private:
> +   void add_phi(ir_if *ir, ir_variable *var);
> +   void add_phi(ir_loop *loop, ir_variable *var);
> +
> +   ir_ssa_state_visitor *ssv;
> +   ir_dead_branches_visitor *dbv;
> +   ir_loop_jumps_visitor *ljv;
> +
> +   exec_list cf_stack;
>

This field needs a comment above it saying what type of object
(ir_control_flow_entry-derived objects I believe) is contained in the list.

> +};
> +
> +}; /* private namespace */
> +
> +ir_visitor_status
> +ir_phi_insertion_visitor::visit_enter(ir_if *ir)
> +{
> +   //before doing anything, visit the condition, since it's really part of
> +   //the block before this
> +   ir->condition->accept(this);
> +
> +   ir_if_entry entry(ir);
> +
> +   this->cf_stack.push_head(&entry);
> +   entry.in_then = true;
> +   visit_list_elements(this, &ir->then_instructions);
> +   entry.in_then = false;
> +   visit_list_elements(this, &ir->else_instructions);
> +   this->cf_stack.pop_head();
>

Since the ir_if_entry is located on the stack, all hell will break loose if
the calls to visit_list_elements() don't pop the same number of stack
entries as they push.  Can we add an assertion just before the call to
pop_head() to double-check that the thing we're about to pop is the same
node that we pushed?

Alternatively, since we only ever need to traverse the stack in one
direction, we could just use a singly linked list instead of an exec_list
(i.e. each ir_control_flow_entry contains an "ir_control_flow_entry *next"
pointer, and then here in this function we can do this:

ir_if_entry entry(this->cf_head, ir);
this->cf_head = &entry; /* Push onto CF stack */
entry.in_then = true;
visit_list_elements(this, &ir->then_instructions);
entry.in_then = false;
visit_list_elements(this, &ir->else_instructions);
this->cs_head = entry.next; /* pop CF stack */

Similar comments apply to ir_phi_insertion_visitor::visit_enter(ir_loop
*ir).

> +
> +   return visit_continue_with_parent;
> +}
> +
> +
> +ir_visitor_status
> +ir_phi_insertion_visitor::visit(ir_dereference_variable *ir)
> +{
> +   if (!this->in_assignee || this->cf_stack.is_empty()
> +       || !this->ssv->get_state(ir->var))
> +      return visit_continue;
> +
> +   exec_node *cur_node = this->cf_stack.head;
> +
> +   ir_control_flow_entry *cf_entry = (ir_control_flow_entry *) cur_node;
> +   ir_if_entry *if_entry = cf_entry->as_if_entry();
> +   if (if_entry) {
> +      ir_dead_branches *db = this->dbv->get_dead_branches(if_entry->ir);
> +      if ((db->then_dead && if_entry->in_then) ||
> +         (db->else_dead && !if_entry->in_then)) {
>

The intent might be clearer as:

   if (if_entry->in_then ? db->then_dead : db->else_dead)

> +        if ((db->then_dead_return && if_entry->in_then) ||
> +            (db->else_dead_return && !if_entry->in_then)) {
>

Similar comment here.

> +           //The branch we're on leads to a return or discard, so the
> +           //assignment can't lead to any join nodes
>

Style nit-pick: Mesa uses C-style comments.

> +           return visit_continue;
> +        }
> +
> +        /*
> +         * The branch we're on leads to a break or continue.
> +         * We may need a phi node at the beginning, end, or both,
> depending
> +         * on if we exit through a continue, break, or both, repsectively.
> +         * We use an approximation here, and simply add a phi node to the
> +         * beginning and end. The worst thing that can happen is that we
> wind
> +         * up with a phi node where all the sources are the same, which
> can
> +         * be easily optimized away in a later pass.
> +         */
>

As with the approximation you mention near the top of the file, on my first
several readings, I failed to understand which parts of the code make this
approximation and which don't.  I now see that the reason this
approximation is safe is because at this point we are only determining
where the phi nodes need to be inserted.  Later, when
ir_phi_insertion_visitor::add_phi() actually inserts the phi nodes, it will
do the right thing (it will only add continue sources to the
ir_phi_loop_begin and it will only add break sources to the
ir_phi_loop_end).  As a result, the worst that could happen is that
pseudocode like this:

int x;
x = 1;
loop {
   if (...) {
      x = x + 1;
      break;
   } else {
      x = x + 2;
   }
}
foo(x);

would get transformed to this:

int x_1 = 1;
loop {
   int x_2 = phi(x_1, x_4);
   if (...) {
      int x_3 = x_2 + 1;
      break;
   } else {
      int x_4 = x_2 + 2;
   }
 }
int x_5 = phi(x_3); // unnecessary phi node

It would be helpful to include an example like this in the comment to
clarify.

> +
> +        do {
> +           cur_node = cur_node->next;
> +           cf_entry = (ir_control_flow_entry *) cur_node;
> +        } while (!cf_entry->as_loop_entry());
>

At first glance it looks like this might walk off the beginning of the list
since we don't check cur_node->is_head_sentinel().  It would be nice to add
a comment explaining that this can't happen, because the only way we can
reach this code is if one of the branches in the if ended in a break or
continue; hence we are in a loop.

> +      }
> +   }
> +
> +   /*
> +    * walk the stack of control flow elements, placing phi nodes as
> +    * necessary
> +    */
> +
> +   for (; cur_node->next != NULL; cur_node = cur_node->next) {
>

I don't think a simple loop suffices here.  Consider the pseudocode:

loop {
   if (...) {
      loop {
         if (...) {
            x = 1;
            break;
         }
         // A
      }
      // B
      break;
   }
   // C
}
// D

Let's say we're visiting the dereference for the "x" in "x = 1".  We should
add phi nodes at locations B and D, but not at A or C.

The "if (if_entry)" block above will take care of making sure we don't add
a phi node at A, but then we'll get to this loop and we'll add phi nodes at
locations B, C, and D.

I believe similar problems will arise if the "if" we're analyzing is nested
inside another "if" without an intervening loop.

I think the right way to deal with this is to fuse this loop with the
previous special handling of "if" so that it looks something like this:

   exec_node *cur_node = this->cf_stack.head;
   for (exec_node *cur_node = this->cf_stack.head;
        !cur_node->is_head_sentinel(); cur_node = cur_node->prev) {
      ir_control_flow_entry *cf_entry = (ir_control_flow_entry *) cur_node;
      if (ir_if_entry *if_entry = cur_node->as_if_entry()) {
         ir_dead_branches *db = this->dbv->get_dead_branches(if_entry->ir);
         if ((db->then_dead && if_entry->in_then) ||
             (db->else_dead && !if_entry->in_then)) {
            if ((db->then_dead_return && if_entry->in_then) ||
                (db->else_dead_return && !if_entry->in_then)) {
               /* The branch we're on leads to a return or discard... */
               return visit_continue;
            }
            /* Find the innermost nested loop */
            do {
               cur_node = cur_node->next;
               cf_entry = (ir_control_flow_entry *) cur_node;
            } while (!cf_entry->as_loop_entry());
            /* Fall through to handle the loop entry */
         } else {
            this->add_phi(if_entry->ir, ir->var);
            /* Move on to the next enclosing if or loop */
            continue;
         }
      }

      /* Handle the loop entry. */
      ir_loop_entry *loop_entry = cur_node->as_loop_entry();
      this->add_phi(loop_entry->loop, ir->var);
   }

> +      cf_entry = (ir_control_flow_entry *) cur_node;
> +
> +      if_entry = cf_entry->as_if_entry();
> +      if (if_entry) {
> +        this->add_phi(if_entry->ir, ir->var);
> +      } else {
> +        ir_loop_entry *loop_entry = cf_entry->as_loop_entry();
> +        this->add_phi(loop_entry->loop, ir->var);
> +      }
> +   }
> +
> +   return visit_continue;
> +}
> +
> +void
> +ir_phi_insertion_visitor::add_phi(ir_if *ir, ir_variable *var)
> +{
> +   void *mem_ctx = ralloc_parent(ir);
> +
> +   //Don't duplicate phi nodes
> +   if (phi_exists(ir->phi_nodes, var)) {
> +      return;
> +   }
> +
> +   ir_phi_if *phi = new (mem_ctx) ir_phi_if(var, var, var);
>

It would be nice to have a short comment explaining that the reason we pass
the same var to ir_phi_if three times is because in this phase of the
algorithm, we are just creating a trivial phi node.  Later,
ir_rewrite_visitor will replace each variable with the appropriate SSA
temporary.

> +   ir->phi_nodes.push_tail(phi);
> +
> +   ir_ssa_variable_state *isvs = this->ssv->get_state(var);
> +   isvs->num_defs++;
> +}
> +
> +void
> +ir_phi_insertion_visitor::add_phi(ir_loop *loop, ir_variable *var)
> +{
> +   void *mem_ctx = ralloc_parent(loop);
> +
> +   //Don't duplicate phi nodes
> +   if (phi_exists(loop->begin_phi_nodes, var)) {
> +      return;
> +   }
> +
> +   ir_loop_jumps *loop_jumps = this->ljv->get_loop_jumps(loop);
> +
> +   ir_phi_loop_begin *phi_begin = new(mem_ctx) ir_phi_loop_begin(var,
> var, var);
>

A similar comment applies here.

> +
> +   foreach_list(n, &loop_jumps->continues) {
> +      ir_loop_jump_entry *entry = (ir_loop_jump_entry *) n;
> +
> +      ir_phi_jump_src *src = new(mem_ctx) ir_phi_jump_src();
> +      src->jump = entry->ir;
> +      src->src = var;
> +
> +      phi_begin->continue_srcs.push_tail(src);
> +   }
> +
> +   loop->begin_phi_nodes.push_tail(phi_begin);
> +
> +   ir_phi_loop_end *phi_end = new(mem_ctx) ir_phi_loop_end(var);
> +
> +   foreach_list(n, &loop_jumps->breaks) {
> +      ir_loop_jump_entry *entry = (ir_loop_jump_entry *) n;
> +
> +      ir_phi_jump_src *src = new(mem_ctx) ir_phi_jump_src();
> +      src->jump = entry->ir;
> +      src->src = var;
> +
> +      phi_end->break_srcs.push_tail(src);
> +   }
> +
> +   loop->end_phi_nodes.push_tail(phi_end);
> +
> +   ir_ssa_variable_state *isvs = this->ssv->get_state(var);
> +   isvs->num_defs += 2;
> +}
> +
> +ir_visitor_status
> +ir_rewrite_backward_visitor::visit(ir_dereference_variable *ir)
>

One of the things that took me a long time to understand about this
algorithm is that the sole purpose of visiting the nodes backwards is to
undo the effects of the forwards visit on the stacks maintained by the
ir_ssa_variable_state objects.  To help clarify this, it would be helpful
if each of the backward visitor methods contained a short comment referring
to the corresponding forward visitor operation which is being undone.  For
example, this method could contain a comment explaining that since the
forward visitors for ir_assignment and ir_call do
ir_ssa_variable_state::new_var(), we need to call stack_pop() to undo the
effect.

You might also consider getting rid of this method, and instead doing the
work in ir_rewrite_backward_visitor::visit(ir_call *) and
ir_rewrite_backward_visitor::viist(ir_assignment *).  It's a little more
code, but the advantage is that it makes the relationship between the
forward and backward visitors much easier to follow.

> +namespace {
> +
> +class ir_rewrite_visitor {
> +public:
> +   ir_rewrite_visitor(ir_ssa_state_visitor *, ir_dead_branches_visitor *);
> +
> +   void rewrite(exec_list *instructions);
> +
> +private:
> +   void rewrite_forwards(exec_list *instructions);
> +   void rewrite_backwards(exec_list *instructions);
> +
> +   void rewrite(ir_if *);
> +   void rewrite(ir_loop *);
> +   void rewrite(ir_loop_jump *);
>

For a while I was confused about why these three methods are called
rewrite(), but they are called from rewrite_forwards(exec_list *).  I'd
recommend renaming them to rewrite_forwards().

> +
> +   void rewrite_backwards(ir_if *);
> +   void rewrite_backwards(ir_loop *);
> +
> +   void rewrite_phi_dest(ir_phi *);
> +
> +   ir_ssa_state_visitor *ssv;
> +   ir_dead_branches_visitor *dbv;
> +   ir_rewrite_forward_visitor rfv;
> +   ir_rewrite_backward_visitor rbv;
> +
> +   ir_loop *outer_loop;
> +};
> +
> +}; /* private namespace */
> +
> +void
> +ir_rewrite_visitor::rewrite(ir_if *ir)
> +{
> +   ir->condition->accept(&this->rfv);
> +
> +   ir_dead_branches *dead_branches = this->dbv->get_dead_branches(ir);
> +   if (dead_branches->then_dead) {
> +      if (dead_branches->else_dead) {
> +        this->rewrite(&ir->then_instructions);
> +        this->rewrite(&ir->else_instructions);
> +      } else {
> +        this->rewrite(&ir->then_instructions);
> +        this->rewrite_forwards(&ir->else_instructions);
> +      }
> +   } else if (dead_branches->else_dead) {
> +      this->rewrite(&ir->else_instructions);
> +      this->rewrite_forwards(&ir->then_instructions);
>

It would be nice to have some comments in the three cases above to explain
the rationale for when we call rewrite() vs rewrite_forwards().

> +   } else {
> +      this->rewrite_forwards(&ir->then_instructions);
> +      foreach_list(n, &ir->phi_nodes) {
> +        ir_phi_if *phi = (ir_phi_if *) n;
> +        ir_ssa_variable_state *isvs;
> +
> +        isvs = this->ssv->get_state(phi->if_src);
> +        phi->if_src = isvs->cur_var(false);
> +      }
> +      this->rewrite_backwards(&ir->then_instructions);
> +
> +      this->rewrite_forwards(&ir->else_instructions);
> +      foreach_list(n, &ir->phi_nodes) {
> +        ir_phi_if *phi = (ir_phi_if *) n;
> +        ir_ssa_variable_state *isvs;
> +
> +        isvs = this->ssv->get_state(phi->else_src);
> +        phi->else_src = isvs->cur_var(false);
> +      }
> +      this->rewrite_backwards(&ir->else_instructions);
> +
> +      foreach_list(n, &ir->phi_nodes) {
> +        ir_phi_if *phi = (ir_phi_if *) n;
> +
> +        this->rewrite_phi_dest(phi);
> +      }
> +   }
> +}
> +
> +void
> +ir_rewrite_visitor::rewrite_backwards(ir_if *ir)
>

As with ir_rewrite_backward_visitor::visit(ir_dereference_variable *ir), it
would be nice to have some comments in this function explaining which
forward visitor operations we're un-doing.  A similar comment applies to
ir_rewrite_visitor::rewrite_backwards(ir_loop *).

Once again, thanks for all the work you've put into this patch.  SSA is
something Mesa has been sorely needing for a long time, and I really like
the approach you're taking.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140130/07db66a4/attachment-0001.html>