<div dir="ltr">On Thu, Jan 30, 2014 at 10:30 PM, Paul Berry <span dir="ltr"><<a href="mailto:stereotype441@gmail.com" target="_blank">stereotype441@gmail.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="im">On 22 January 2014 09:16, Connor Abbott <span dir="ltr"><<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>></span> wrote:<br> </div><div class="gmail_extra"><div class="gmail_quote"><div class="im"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">opt_to_ssa will convert temporaries and local variables to SSA form,<br> although for now it can't handle array and record dereferences.<br> ---<br> src/glsl/Makefile.sources | 1 +<br> src/glsl/ir_optimization.h | 2 +<br> src/glsl/opt_to_ssa.cpp | 1155 ++++++++++++++++++++++++++++++++++++++++++++<br> 3 files changed, 1158 insertions(+)<br> create mode 100644 src/glsl/opt_to_ssa.cpp<br></blockquote><div><br></div></div><div>General comment: this is a long and tricky algorithm. In addition to the overview information you provided at the top of opt_to_ssa.cpp (which is a great help, BTW), it would be really nice to have a summary that describes the phases of the algorithm and how they relate to each other. I'd recommend moving the class definitions to a header file; this would allow convert_to_ssa_function() to be moved to the top of opt_to_ssa.cpp, and then you could annotate that function with comments explaining what each phase of the algorithm does.<br> </div></div></div></div></blockquote><div><br></div><div style>Sure, yeah that makes sense.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> <br></div><div>You might also consider creating separate cpp/h files (and separate patches) for:<br><br>- ir_ssa_state_visitor (along with the associated class ir_ssa_variable_state)<br></div><div>- ir_parameter_visitor<br> </div><div>- ir_phi_insertion_visitor (along with the associated class ir_control_flow_entry and derived classes)<br></div><div>- ir_rewrite_visitor (along with the associated classes ir_rewrite_forward_visitor and ir_rewrite_backward_visitor)<br> <br></div><div>(kind of like you already did with ir_dead_branches_visitor and ir_loop_jumps_visitor). That would make future reviews go a lot more smoothly, because it would avoid having a single large patch, and the way the various phases of the algorithm fit together would be clear from the sequence of commit messages.<br> </div></div></div></div></blockquote><div><br></div><div style>So the reason I kept these passes together in the same file (as opposed to ir_dead_branches_visitor and ir_loop_jumps_visitor) is that they won't really be useful outside of this specific pass. ir_dead_branches will come in handy any time we need to calculate the dominance tree, and ir_loop_jumps is needed for anything that wants to insert a phi node into a loop, but all the other passes are only going to be useful in this specific context; they fit together in a specific way in order to make the algorithm work. If you still think it's a good idea to put them in separate files, then I'm ok with that, but that's the reason things are what they are now. Perhaps I can still keep them in the same file, but split up this patch so that they are introduced one-by-one?</div> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"> <div> <br></div><div><br></div><div>FWIW here are the notes I made about the phases of your algorithm. Feel free to use my words in your comments if they're helpful:<br><br>1. Run ir_dead_branches_visitor: determine control flow out of if's.<br> 2. Run ir_loop_jumps_visitor: find set of breaks and continues from each loop.<br>3. Run ir_ssa_state_visitor: create ir_ssa_variable_state object for each SSA-able var in the original prog, count number of times each SSA-able var is assigned to.<br> 4. Run ir_parameter_visitor: convert out and inout parameters of calls to a form that is easier to convert to SSA.<br>5. Run ir_phi_insertion_visitor: insert so-called "trivial" phi functions (phi functions of the form V = phi(V, V, ...)). Note that since phi nodes count as assignments, this visitor needs to update the counts produced by ir_ssa_state_visitor.<br> 6. Call ssv.allocate_state_arrays(): allocate a big enough stack of ir_variable *'s inside each ir_ssa_variable_state to hold one pointer for each assignment to the variable (including assignments by phi nodes).<br>7. Run ir_rewrite_visitor: replace each SSA-able variable V with an SSA variable Vi such that no Vi is assigned more than once. This includes rewriting the trivial phi functions so that they are no longer trivial.<br> 8. Call ssv.remove_decls(): remove the declarations of the old variables V now that they have been replaced by Vi's.<br><br></div><div><br>A few other things I stumbled on that could really use clarification include:<br> <br></div>- It took me a lot of reading to understand why ir_rewrite_visitor visits nodes in both forward and reverse order. I now understand that what's happening is that we're maintaining a stack of SSA temporaries corresponding to each original SSA-able variable; when we visit the code in the forward direction we push temporaries onto the stack; when we visit the code in reverse, we pop them off again. The order of visiting flow control has been carefully chosen so that at any time, the SSA temporary that's at the top of the stack is the one holding the value of the variable that is "current" from the point of view of the visitor's current location. It would be helpful to have this information in a comment, perhaps with a few pseudocode examples to show the order in which nodes are visited.<br> </div></div></div></blockquote><div><br></div><div style> I didn't the stuff you're describing because most of it was explained in the original paper, but I suppose we should at least have a summary for people too lazy to read it.<br> </div><div style><br></div><div style>btw, ir_rewrite_visitor is based on the pseudocode in figure 12 in the paper I referenced, except rather than traverse the dominance tree explicitly, we use the approximation I described in the overview at the beginning - that's where the "order of visiting control flow" comes from.</div> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"> <div><br></div><div>- The paper mentions that there is an option of producing either a pruned SSA form or a non-pruned SSA form (the difference is that in non-pruned SSA form, a phi function is inserted wherever two control paths making different assignments to a variable converge; in pruned SSA form phi nodes are pruned out if they would never be referenced). It looks like you chose to produce non-pruned SSA. Can you comment on your reasoning for doing so?<br> </div></div></div></div></blockquote><div><br></div><div style>Honestly, I don't really know much about the modifications necessary for that. IIRC, however, dead-code elimination will remove the phi nodes that we would avoid by producing pruned SSA anyways, so I don't think it's worth the extra complexity for an already complex enough algorithm.</div> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"> <div> </div><br></div><div class="gmail_quote">- Why don't we try to convert arrays and structures to SSA? The paper suggests that A[i] = x could be replaced by A = Update(A, i, x), but it looks like you've chosen not to do that. Is that because there's some subtle reason why it doesn't work for GLSL, or is it just because you haven't had a chance to make this improvement?<br> </div></div></div></blockquote><div><br></div><div style>First of all, it was just a reason of time (have to introduce the update operation, lower it, have a way for hopefully getting rid of it after we get out of SSA, etc.). Also, I'm worried that optimizations done in SSA form may make it impossible to remove all the update operations once we get out of SSA, meaning that we would have to generate code to copy the entire array, which would suck; I haven't given that too much thought though, so I may be wrong. Also, it seems like most compilers (LLVM, GCC, R600-SB, etc.) don't bother with this, and don't convert arrays to SSA anyways. Hopefully, since pretty much every backend expects temporary structures to be flattened, we won't have to worry about those either.</div> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"> </div><div class="gmail_quote"><br><br></div><div class="gmail_quote">Thanks so much, Connor. You've obviously done a lot of work on this and I hope that the volume of my comments doesn't dishearten you; I'm really excited to see SSA support added to Mesa, and I think with a little clean-up this ought to be ready to merge in pretty soon!<br> <br></div><div class="gmail_quote">More comments below:<br> <br></div><div class="gmail_quote"><div class="im"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> <br> + *<br> + * - The instruction before the loop dominates the instructions inside the loop<br> + * as well as the instructions after the loop. Here is where the approximation<br> + * lies: really, since the loop is guarenteed to execute at least once, the<br></blockquote><div><br></div></div><div>"guaranteed"<br></div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> + * instructions after the loop can potentially be dominated by an instruction<br> + * inside the loop. Computing that instruction, though, would be complicated,<br> + * and in the end it doesn't hurt much if we ignore that detail. In the end, we<br> + * may have some phi nodes where all the sources are the same, but these can<br> + * easily be optimized away.<br></blockquote><div><br></div></div><div>On my first several readings, I misunderstood this comment to mean "we convert to SSA form assuming that each loop might execute zero times". Now that I've read through the full algorithm, I understand that this approximation only applies during the ir_phi_insertion_visitor part of the algorithm. The rest of the algorithm doesn't make this approximation, and that's why the only consequence of the approximation is that we may have some unnecessary phi nodes. It would be nice to add some text to clarify this.<br> </div></div></div></div></blockquote><div><br></div><div style>Actually, this approximation is used in ir_rewrite_visitor as well; in the original algorithm, we do a depth-first traversal of the dominance tree, but as I explained above, we're basically using our approximation to visit statements in the right order without having to explicitly construct the dominance tree.</div> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"> <div> <br></div><div>Also, note that the unnecessary phi nodes will not always have all sources the same. Consider this pseudocode:<br><br></div><div>int x;<br></div><div>x = 1;<br></div><div>loop {<br></div><div> if (...) {<br> </div><div> x = x + 1;<br></div><div> break;<br></div><div> } else if (...) {<br></div><div> x = x + 2;<br></div><div> break;<br> }<br></div><div>}<br></div><div>foo(x);<br><br></div><div>If I'm not mistaken, your algorithm will convert this to the following SSA:<br> <br></div><div>int x_1 = 1;<br></div><div>loop {<br></div><div> int x_2 = phi(x_1, x_2); // Unnecessary phi node<br></div><div> if (...) {<br></div><div> int x_3 = x_2 + 1;<br></div><div> break;<br></div><div> } else if (...) {<br></div><div> int x_4 = x_2 + 2;<br></div><div> break;<br> }<br></div><div>}<br></div><div>int x_4 = phi(x_3, x_4);<br></div><div>foo(x_4);<br></div><div><br></div><div><br></div><div>So, removing the unnecessary phi nodes in a later optimization pass won't be quite as trivial as looking for phi nodes where all sources are the same; we'll also have to detect phi nodes where all sources are equivalent.<br> <br></div><div>Having said that, I still think your approximation is reasonable.<br></div></div></div></div></blockquote><div><br></div><div style>Ah, thanks for pointing that out.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div class="im"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> +class ir_ssa_variable_state<br> +{<br> +public:<br> + ir_ssa_variable_state(ir_variable *var, ir_ssa_state_visitor *v,<br> + ir_variable *undefined_var);<br> + ~ir_ssa_variable_state();<br> +<br> + ir_variable *var; /* the original variable. */<br></blockquote><div><br></div></div><div>Let's use Javadoc format for this comment and the ones below (i.e. "/**< the original variable. */")<br></div> <div class="im"><div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> +<br> + void stack_push(ir_variable *new_var);<br> +<br> + void stack_pop();<br> +<br> + ir_variable *cur_var(bool);<br></blockquote><div><br></div></div><div>It would be really helpful if this said<br><br></div><div> ir_variable *cur_var(bool use_undefined_var);<br><br>so that it's not necessary to cross reference to the class definition to see what the bool argument is for.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div class="im"> +<br> + ir_variable *new_var();<br> +<br> + ir_variable **stack; /* The stack of replacements for the variable. */<br> + int num_replaced;<br> + int num_defs; /* the number of times the variable is assigned */<br> + int stack_idx; /* the current index into the stack */<br> +<br> + ir_ssa_state_visitor *v;<br> +<br> + ir_variable *undefined_var; /** < used for when var is read before written. */<br> +};<br> +<br></div><div class="im"> +<br> +ir_variable *<br> +ir_ssa_variable_state::cur_var(bool use_undefined_var)<br></div></blockquote><div><br></div><div>It would be nice to have some comment text above this function explaining when and why it makes sense to pass true for use_undefined_var. My understanding (and please correct me if I'm wrong) is that we set use_undefined_var = false when we're populating a phi node (because phi nodes understand null to mean "variable is uninitialized along this execution path"), but we set use_undefined_var = true when we're populating an ir_dereference_variable (because ir_dereference_variable doesn't handle having a null var field).<br> </div></div></div></div></blockquote><div><br></div><div style>That's correct.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> <br></div><div class="im"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> +<br> +void<br> +ir_ssa_variable_state::stack_push(ir_variable *new_var)<br> +{<br> + this->stack_idx++;<br> + this->num_replaced++;<br> + assert(this->num_replaced <= this->num_defs);<br></blockquote><div><br></div></div><div>Can we also add the assertion:<br><br></div><div> assert(this->stack_idx < this->num_defs);<br><br></div><div> so that we'll be confident that the code below doesn't overflow the stack? I realize that you've arranged things so that stack_push() is never called more than num_defs times, but that's not obvious without understanding the whole algorithm.<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div class="im"> + this->stack[this->stack_idx] = new_var;<br> + _mesa_hash_table_insert(this->v->new_to_old, _mesa_hash_pointer(new_var),<br> + new_var, this->var);<br> +}<br> +<br></div><div class="im"> +<br> +ir_variable *<br> +ir_ssa_variable_state::new_var()<br> +{<br> + void *mem_ctx = ralloc_parent(this->var);<br> + char *new_name = ralloc_asprintf(mem_ctx, "%s_%i", this->var->name,<br> + this->num_replaced);<br></div></blockquote><div><br></div><div>Clearly the intention here is for this to generate a unique name each time it is called. But to verify that it does that, you need to notice that this->num_replaced is incremented by stack_push(), and that stack_push() is always called by this function.<br> <br></div><div>How about if instead we moved the lines:<br><br> this->num_replaced++;<div class="im"><br> assert(this->num_replaced <= this->num_defs);<br><br></div></div><div>from stack_push() to new_var()?<br> </div><div class="im"><div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> + ir_variable *new_var = new(mem_ctx) ir_variable(this->var->type, new_name,<br> + ir_var_temporary_ssa);<br> + this->stack_push(new_var);<br> + return new_var;<br> +}<br> +<br> +ir_ssa_variable_state::ir_ssa_variable_state(ir_variable* var,<br> + ir_ssa_state_visitor *v,<br> + ir_variable *undefined_var)<br> + : var(var), v(v), undefined_var(undefined_var)<br> +{<br> + this->stack = NULL;<br> + this->num_replaced = 0;<br> + this->num_defs = 0;<br> + this->stack_idx = -1;<br> +}<br></blockquote><div><br></div></div><div>It seems odd to initialize some variables in the initializer list and others in the body of the constructor. Can we intialize them all in the initializer list?<br></div><div class="im"> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> +<br> +<br> +ir_visitor_status<br> +ir_ssa_state_visitor::visit(ir_variable *var)<br> +{<br> + if (var->data.mode == ir_var_auto || var->data.mode == ir_var_temporary) {<br> + void *mem_ctx = ralloc_parent(var);<br> + ir_assignment *assign = ssa_assign("undefined",<br> + ir_constant::zero(mem_ctx, var->type));<br></blockquote><div><br></div></div><div>Previous to this patch, if a GLSL program that read from a variable before writing to it, it would get an undefined result (it would read garbage from a register that was previously used for another purpose, possibly even by a previous shader invocation). It looks like your SSA transformation changes that, so that all uninitialized variables now read as zero.<br> <br></div><div>There are a few reasons why a shader might legitimately contain a read from an uninitialized variable. For example, consider a vertex shader that only populates one of its outputs a certain uniform boolean is true, linked with a fragment shader that only examines the corresponding input when the same boolean is true. Currently, Mesa will compile the vertex shader in such a way that when the uniform is false, the vertex shader output contains garbage data. With your SSA conversion, the vertex shader will need an extra instruction to initialize the output to zero when the uniform is false. I'm not sure if this will ever make a big enough difference for us to care; if it does, there are some tricks we could play to avoid the unnecessary initialization (for example, we could add a new type of ir node, ir_undefined, which evaluates to garbage). It would be nice to add a comment somewhere explaining this situation so that if we later decide it's worth improving, we'll have an idea where to start.<br> </div></div></div></div></blockquote><div><br></div><div style>Sure... the whole undefined_var thing is really a hack to keep things from going haywire when we hit these sorts of situations; we really should have a defined way to say "this instruction reads from an undefined value" similar to what we do for phi nodes. I think we can then optimize away expressions that have undefined values in them by just making their result undefined (unlike for phi nodes); for now, this will do though.</div> <div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"> <div> </div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div class="im"> + ir_variable *undefined_var = assign->lhs->as_dereference_variable()->var;<br> + var->insert_after(assign);<br> + ir_ssa_variable_state *entry = new ir_ssa_variable_state(var, this,<br> + undefined_var);<br> + _mesa_hash_table_insert(this->ht, _mesa_hash_pointer(var), var, entry);<br> + }<br> +<br> + return visit_continue;<br> +}<br> +<br></div><div><div class="h5"> +namespace {<br> +<br> +/*<br> + * Rewrites out and inout parameters of functions to use a separate temporary.<br> + * For example if we have:<br> + *<br> + * void foo(out vec4 arg1, inout vec4 arg2);<br> + *<br> + * and it gets called like:<br> + *<br> + * foo(bar, baz);<br> + *<br> + * Then assuming bar and baz are local variables to be transformed into SSA, it<br> + * will be rewritten as<br> + *<br> + * vec4 tmp1, tmp2 = baz;<br> + * foo(tmp1, tmp2);<br> + * bar = tmp1;<br> + * baz = tmp2;<br> + *<br> + * This captures the correct semantics of the original while still allowing us<br> + * to convert bar and baz to SSA variables; in effect, this limits the<br> + * "non-SSA-ness" to those four statements, hopefully allowing more<br> + * optimizations to occur than if we simply prevented bar and baz from being<br> + * transformed into SSA form.<br> + */<br></div></div></blockquote><div><br></div><div>How well does this transformation handle abominations like the following?<br><br></div><div> int i = 0;<br></div><div> int a[2] = { 0, 0 };<br></div><div> foo(out i, out a[i]);<br> <br></div><div>(Note: I'm not sure Mesa, or any other implementation, gets the semantics of this sort of thing right in all circumstances, so if your algorithm doesn't either, that's not necessarily a big deal. We should probably just make a note of it with a "FIXME" comment so that if there's trouble in the future we'll know what we need to work on).<br> </div></div></div></div></blockquote><div><br></div><div style>Section 6.1.1 "Function Calling Conventions" of the GLSL 4.30 spec says "The order in which output parameters are copied back to the caller is undefined," so I believe what that code does is undefined; GLSL 1.10 has the same language, so I assume it's always been that way.</div> <div><br></div><div style>(P.S. if you write that in your shader, you're evil and I hate you.) </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> <div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div> </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> +class ir_phi_insertion_visitor : public ir_hierarchical_visitor<br> +{<br> +public:<br> + ir_phi_insertion_visitor(ir_ssa_state_visitor *ssv,<br> + ir_dead_branches_visitor *dbv,<br> + ir_loop_jumps_visitor *ljv)<br> + : ssv(ssv), dbv(dbv), ljv(ljv)<br> + {<br> + }<br> +<br> + virtual ir_visitor_status visit_enter(ir_if *);<br> + virtual ir_visitor_status visit_enter(ir_loop *);<br> + virtual ir_visitor_status visit(ir_dereference_variable *);<br> +<br> +private:<br> + void add_phi(ir_if *ir, ir_variable *var);<br> + void add_phi(ir_loop *loop, ir_variable *var);<br> +<br> + ir_ssa_state_visitor *ssv;<br> + ir_dead_branches_visitor *dbv;<br> + ir_loop_jumps_visitor *ljv;<br> +<br> + exec_list cf_stack;<br></blockquote><div><br></div></div><div>This field needs a comment above it saying what type of object (ir_control_flow_entry-derived objects I believe) is contained in the list.<br></div><div class="im"> <div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> +};<br> +<br> +}; /* private namespace */<br> +<br> +ir_visitor_status<br> +ir_phi_insertion_visitor::visit_enter(ir_if *ir)<br> +{<br> + //before doing anything, visit the condition, since it's really part of<br> + //the block before this<br> + ir->condition->accept(this);<br> +<br> + ir_if_entry entry(ir);<br> +<br> + this->cf_stack.push_head(&entry);<br> + entry.in_then = true;<br> + visit_list_elements(this, &ir->then_instructions);<br> + entry.in_then = false;<br> + visit_list_elements(this, &ir->else_instructions);<br> + this->cf_stack.pop_head();<br></blockquote><div><br></div></div><div>Since the ir_if_entry is located on the stack, all hell will break loose if the calls to visit_list_elements() don't pop the same number of stack entries as they push. Can we add an assertion just before the call to pop_head() to double-check that the thing we're about to pop is the same node that we pushed?<br> <br></div><div>Alternatively, since we only ever need to traverse the stack in one direction, we could just use a singly linked list instead of an exec_list (i.e. each ir_control_flow_entry contains an "ir_control_flow_entry *next" pointer, and then here in this function we can do this:<br> <br></div><div>ir_if_entry entry(this->cf_head, ir);<br></div><div>this->cf_head = &entry; /* Push onto CF stack */<br></div><div>entry.in_then = true;<br></div><div>visit_list_elements(this, &ir->then_instructions);<br> </div><div>entry.in_then = false;<br></div><div>visit_list_elements(this, &ir->else_instructions);<br></div><div>this->cs_head = entry.next; /* pop CF stack */<br><br></div><div>Similar comments apply to ir_phi_insertion_visitor::visit_enter(ir_loop *ir).<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div class="im"> +<br> + return visit_continue_with_parent;<br> +}<br> +<br></div><div class="im">+<br> +ir_visitor_status<br> +ir_phi_insertion_visitor::visit(ir_dereference_variable *ir)<br> +{<br> + if (!this->in_assignee || this->cf_stack.is_empty()<br> + || !this->ssv->get_state(ir->var))<br> + return visit_continue;<br> +<br> + exec_node *cur_node = this->cf_stack.head;<br> +<br> + ir_control_flow_entry *cf_entry = (ir_control_flow_entry *) cur_node;<br> + ir_if_entry *if_entry = cf_entry->as_if_entry();<br> + if (if_entry) {<br> + ir_dead_branches *db = this->dbv->get_dead_branches(if_entry->ir);<br> + if ((db->then_dead && if_entry->in_then) ||<br> + (db->else_dead && !if_entry->in_then)) {<br></div></blockquote><div><br></div><div>The intent might be clearer as:<br><br></div><div> if (if_entry->in_then ? db->then_dead : db->else_dead)<br> </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> + if ((db->then_dead_return && if_entry->in_then) ||<br> + (db->else_dead_return && !if_entry->in_then)) {<br></blockquote><div><br></div></div><div>Similar comment here.<br></div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> + //The branch we're on leads to a return or discard, so the<br> + //assignment can't lead to any join nodes<br></blockquote><div><br></div></div><div>Style nit-pick: Mesa uses C-style comments.<br></div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> + return visit_continue;<br> + }<br> +<br> + /*<br> + * The branch we're on leads to a break or continue.<br> + * We may need a phi node at the beginning, end, or both, depending<br> + * on if we exit through a continue, break, or both, repsectively.<br> + * We use an approximation here, and simply add a phi node to the<br> + * beginning and end. The worst thing that can happen is that we wind<br> + * up with a phi node where all the sources are the same, which can<br> + * be easily optimized away in a later pass.<br> + */<br></blockquote><div><br></div></div><div>As with the approximation you mention near the top of the file, on my first several readings, I failed to understand which parts of the code make this approximation and which don't. I now see that the reason this approximation is safe is because at this point we are only determining where the phi nodes need to be inserted. Later, when ir_phi_insertion_visitor::add_phi() actually inserts the phi nodes, it will do the right thing (it will only add continue sources to the ir_phi_loop_begin and it will only add break sources to the ir_phi_loop_end). As a result, the worst that could happen is that pseudocode like this: <br> </div><div><br></div><div>int x;<br></div><div>x = 1;<br></div><div>loop {<br></div><div> if (...) {<br></div><div> x = x + 1;<br></div><div> break;<br></div><div> } else {<br></div><div> x = x + 2;<br> }<br></div><div>}<br></div><div>foo(x);<br><br></div><div>would get transformed to this:<br> <br></div><div>int x_1 = 1;<br></div><div>loop {<br></div><div> int x_2 = phi(x_1, x_4);<br></div><div> if (...) {<br></div><div> int x_3 = x_2 + 1;<br></div><div> break;<br></div><div> } else {<br></div> <div> int x_4 = x_2 + 2;<br> }<br> </div> }<br></div><div class="gmail_quote">int x_5 = phi(x_3); // unnecessary phi node<br><br></div><div>It would be helpful to include an example like this in the comment to clarify.<br> <br></div><div class="gmail_quote"><div class="im"> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> +<br> + do {<br> + cur_node = cur_node->next;<br> + cf_entry = (ir_control_flow_entry *) cur_node;<br> + } while (!cf_entry->as_loop_entry());<br></blockquote><div><br></div></div><div>At first glance it looks like this might walk off the beginning of the list since we don't check cur_node->is_head_sentinel(). It would be nice to add a comment explaining that this can't happen, because the only way we can reach this code is if one of the branches in the if ended in a break or continue; hence we are in a loop.<br> </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> + }<br> + }<br> +<br> + /*<br> + * walk the stack of control flow elements, placing phi nodes as<br> + * necessary<br> + */<br> +<br> + for (; cur_node->next != NULL; cur_node = cur_node->next) {<br></blockquote><div><br></div></div><div>I don't think a simple loop suffices here. Consider the pseudocode:<br><br>loop {<br> if (...) {<br> loop {<br> if (...) {<br> x = 1;<br> break;<br> }<br> // A<br> }<br> // B<br> break;<br> }<br> // C<br>}<br>// D<br><br></div><div>Let's say we're visiting the dereference for the "x" in "x = 1". We should add phi nodes at locations B and D, but not at A or C.<br> </div><div><br></div><div>The "if (if_entry)" block above will take care of making sure we don't add a phi node at A, but then we'll get to this loop and we'll add phi nodes at locations B, C, and D.<br> <br></div><div>I believe similar problems will arise if the "if" we're analyzing is nested inside another "if" without an intervening loop.<br></div><div><br><br></div><div>I think the right way to deal with this is to fuse this loop with the previous special handling of "if" so that it looks something like this:<div class="im"> <br> <br> exec_node *cur_node = this->cf_stack.head;<br></div> for (exec_node *cur_node = this->cf_stack.head;<br> !cur_node->is_head_sentinel(); cur_node = cur_node->prev) {<div class="im"><br> ir_control_flow_entry *cf_entry = (ir_control_flow_entry *) cur_node;<br> </div> if (ir_if_entry *if_entry = cur_node->as_if_entry()) {<div class="im"><br> ir_dead_branches *db = this->dbv->get_dead_branches(if_entry->ir);<br></div><div class="im"> if ((db->then_dead && if_entry->in_then) ||<br> </div><div class="im"> (db->else_dead && !if_entry->in_then)) {<br></div><div class="im"> if ((db->then_dead_return && if_entry->in_then) ||<br></div><div class="im"> (db->else_dead_return && !if_entry->in_then)) {<br> </div> /* The branch we're on leads to a return or discard... */<br> return visit_continue;<br> }<br> /* Find the innermost nested loop */<br> do {<br> cur_node = cur_node->next;<div class="im"> <br> cf_entry = (ir_control_flow_entry *) cur_node;<br></div> } while (!cf_entry->as_loop_entry());<br> /* Fall through to handle the loop entry */<br> } else {<div class="im"><br> this->add_phi(if_entry->ir, ir->var);<br></div> /* Move on to the next enclosing if or loop */<br> continue;<br> }<br> }<br><br> /* Handle the loop entry. */<br> ir_loop_entry *loop_entry = cur_node->as_loop_entry();<div class="im"> <br> this->add_phi(loop_entry->loop, ir->var);<br> }<br></div></div></div></div></div></blockquote><div><br></div><div style>Nice... much cleaner than what I have, and fixes a bug as well :)</div><div> </div> <blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"> <div><div class="im"><br><br></div></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div class="im"> + cf_entry = (ir_control_flow_entry *) cur_node;<br> +<br> + if_entry = cf_entry->as_if_entry();<br> + if (if_entry) {<br> + this->add_phi(if_entry->ir, ir->var);<br> + } else {<br> + ir_loop_entry *loop_entry = cf_entry->as_loop_entry();<br> + this->add_phi(loop_entry->loop, ir->var);<br> + }<br> + }<br> +<br> + return visit_continue;<br> +}<br> +<br></div><div class="im">+void<br> +ir_phi_insertion_visitor::add_phi(ir_if *ir, ir_variable *var)<br> +{<br> + void *mem_ctx = ralloc_parent(ir);<br> +<br> + //Don't duplicate phi nodes<br> + if (phi_exists(ir->phi_nodes, var)) {<br> + return;<br> + }<br> +<br> + ir_phi_if *phi = new (mem_ctx) ir_phi_if(var, var, var);<br></div></blockquote><div><br></div><div>It would be nice to have a short comment explaining that the reason we pass the same var to ir_phi_if three times is because in this phase of the algorithm, we are just creating a trivial phi node. Later, ir_rewrite_visitor will replace each variable with the appropriate SSA temporary.<br> </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> + ir->phi_nodes.push_tail(phi);<br> +<br> + ir_ssa_variable_state *isvs = this->ssv->get_state(var);<br> + isvs->num_defs++;<br> +}<br> +<br> +void<br> +ir_phi_insertion_visitor::add_phi(ir_loop *loop, ir_variable *var)<br> +{<br> + void *mem_ctx = ralloc_parent(loop);<br> +<br> + //Don't duplicate phi nodes<br> + if (phi_exists(loop->begin_phi_nodes, var)) {<br> + return;<br> + }<br> +<br> + ir_loop_jumps *loop_jumps = this->ljv->get_loop_jumps(loop);<br> +<br> + ir_phi_loop_begin *phi_begin = new(mem_ctx) ir_phi_loop_begin(var, var, var);<br></blockquote><div><br></div></div><div>A similar comment applies here.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> <div><div class="h5"> +<br> + foreach_list(n, &loop_jumps->continues) {<br> + ir_loop_jump_entry *entry = (ir_loop_jump_entry *) n;<br> +<br> + ir_phi_jump_src *src = new(mem_ctx) ir_phi_jump_src();<br> + src->jump = entry->ir;<br> + src->src = var;<br> +<br> + phi_begin->continue_srcs.push_tail(src);<br> + }<br> +<br> + loop->begin_phi_nodes.push_tail(phi_begin);<br> +<br> + ir_phi_loop_end *phi_end = new(mem_ctx) ir_phi_loop_end(var);<br> +<br> + foreach_list(n, &loop_jumps->breaks) {<br> + ir_loop_jump_entry *entry = (ir_loop_jump_entry *) n;<br> +<br> + ir_phi_jump_src *src = new(mem_ctx) ir_phi_jump_src();<br> + src->jump = entry->ir;<br> + src->src = var;<br> +<br> + phi_end->break_srcs.push_tail(src);<br> + }<br> +<br> + loop->end_phi_nodes.push_tail(phi_end);<br> +<br> + ir_ssa_variable_state *isvs = this->ssv->get_state(var);<br> + isvs->num_defs += 2;<br> +}<br> +<br></div></div> +ir_visitor_status<br> +ir_rewrite_backward_visitor::visit(ir_dereference_variable *ir)<br></blockquote><div><br></div><div>One of the things that took me a long time to understand about this algorithm is that the sole purpose of visiting the nodes backwards is to undo the effects of the forwards visit on the stacks maintained by the ir_ssa_variable_state objects. To help clarify this, it would be helpful if each of the backward visitor methods contained a short comment referring to the corresponding forward visitor operation which is being undone. For example, this method could contain a comment explaining that since the forward visitors for ir_assignment and ir_call do ir_ssa_variable_state::new_var(), we need to call stack_pop() to undo the effect.<br> <br>You might also consider getting rid of this method, and instead doing the work in ir_rewrite_backward_visitor::visit(ir_call *) and ir_rewrite_backward_visitor::viist(ir_assignment *). It's a little more code, but the advantage is that it makes the relationship between the forward and backward visitors much easier to follow.<br> </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"> +namespace {<br> +<br> +class ir_rewrite_visitor {<br> +public:<br> + ir_rewrite_visitor(ir_ssa_state_visitor *, ir_dead_branches_visitor *);<br> +<br> + void rewrite(exec_list *instructions);<br> +<br> +private:<br> + void rewrite_forwards(exec_list *instructions);<br> + void rewrite_backwards(exec_list *instructions);<br> +<br> + void rewrite(ir_if *);<br> + void rewrite(ir_loop *);<br> + void rewrite(ir_loop_jump *);<br></blockquote><div><br></div></div><div>For a while I was confused about why these three methods are called rewrite(), but they are called from rewrite_forwards(exec_list *). I'd recommend renaming them to rewrite_forwards().<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div class="im"> +<br> + void rewrite_backwards(ir_if *);<br> + void rewrite_backwards(ir_loop *);<br> +<br> + void rewrite_phi_dest(ir_phi *);<br> +<br> + ir_ssa_state_visitor *ssv;<br> + ir_dead_branches_visitor *dbv;<br> + ir_rewrite_forward_visitor rfv;<br> + ir_rewrite_backward_visitor rbv;<br> +<br> + ir_loop *outer_loop;<br> +};<br> +<br> +}; /* private namespace */<br> +<br></div><div class="im"> +void<br> +ir_rewrite_visitor::rewrite(ir_if *ir)<br> +{<br> + ir->condition->accept(&this->rfv);<br> +<br> + ir_dead_branches *dead_branches = this->dbv->get_dead_branches(ir);<br> + if (dead_branches->then_dead) {<br> + if (dead_branches->else_dead) {<br> + this->rewrite(&ir->then_instructions);<br> + this->rewrite(&ir->else_instructions);<br> + } else {<br> + this->rewrite(&ir->then_instructions);<br> + this->rewrite_forwards(&ir->else_instructions);<br> + }<br> + } else if (dead_branches->else_dead) {<br> + this->rewrite(&ir->else_instructions);<br> + this->rewrite_forwards(&ir->then_instructions);<br></div></blockquote><div><br></div><div>It would be nice to have some comments in the three cases above to explain the rationale for when we call rewrite() vs rewrite_forwards().<br> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div class="h5"> + } else {<br> + this->rewrite_forwards(&ir->then_instructions);<br> + foreach_list(n, &ir->phi_nodes) {<br> + ir_phi_if *phi = (ir_phi_if *) n;<br> + ir_ssa_variable_state *isvs;<br> +<br> + isvs = this->ssv->get_state(phi->if_src);<br> + phi->if_src = isvs->cur_var(false);<br> + }<br> + this->rewrite_backwards(&ir->then_instructions);<br> +<br> + this->rewrite_forwards(&ir->else_instructions);<br> + foreach_list(n, &ir->phi_nodes) {<br> + ir_phi_if *phi = (ir_phi_if *) n;<br> + ir_ssa_variable_state *isvs;<br> +<br> + isvs = this->ssv->get_state(phi->else_src);<br> + phi->else_src = isvs->cur_var(false);<br> + }<br> + this->rewrite_backwards(&ir->else_instructions);<br> +<br> + foreach_list(n, &ir->phi_nodes) {<br> + ir_phi_if *phi = (ir_phi_if *) n;<br> +<br> + this->rewrite_phi_dest(phi);<br> + }<br> + }<br> +}<br> +<br>+void<br></div></div> +ir_rewrite_visitor::rewrite_backwards(ir_if *ir)<br></blockquote><div><br></div><div>As with ir_rewrite_backward_visitor::visit(ir_dereference_variable *ir), it would be nice to have some comments in this function explaining which forward visitor operations we're un-doing. A similar comment applies to ir_rewrite_visitor::rewrite_backwards(ir_loop *).<br> </div><div> </div><br></div><div class="gmail_quote">Once again, thanks for all the work you've put into this patch. SSA is something Mesa has been sorely needing for a long time, and I really like the approach you're taking. </div> </div></div></blockquote></div><br></div></div>