<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott <span dir="ltr"><<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This won't completely solve the problem. For example, what if you<br> hoist the assignment to color2 outside the loop?<br> <br> vec4 color2;<br> while (1) {<br> vec4 color = texture();<br> <span class=""> color2 = color * 2;<br> </span> if (...) {<br> break;<br> }<br> }<br> gl_FragColor = color2;<br> <br> <br> Now the definition still dominates the use, even with the modified<br> control-flow graph, and you have the same problem</blockquote><div><br></div><div>Curro had me convinced that some detail of the liveness analysis pass saved us here but now I can't remember what. :-( <br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> The real problem is<br> that the assignment to color2 is really a conditional assignment: if<br> we're going channel-by-channel, it's not, but if you consider the<br> *whole* register at the same time, it is. To really fix the problem,<br> you need to model exactly what the machine actually does: you need to<br> insert "fake" edges like these, that model the jumps that the machine<br> can take, and you need to make every assignment a conditional<br> assignment (i.e. it doesn't kill the register). It's probably not as<br> bad with Curro's patch on top, though. Also, once you do this you can<br> make register allocation more accurate by generating interferences<br> from the liveness information directly instead of from the intervals.<br> <br> One thing I've thought about is, in addition to maintaining this<br> "whole-vector" view of things, is to maintain a "per-channel" liveness<br> that doesn't use the extra edges, partial definitions etc. and then<br> use the "per-channel view" to calculate interference when the channels<br> always line up.<br><div><div class="h5"></div></div></blockquote><div><br></div><div>Yes, we've considered that and it's a good idea. However, I'm trying to fix bugs right now, not write the world's best liveness analysis pass. :-)<br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="h5"> On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote:<br> > Shader-db results on Sky Lake:<br> ><br> > total instructions in shared programs: 12955125 -> 12953698 (-0.01%)<br> > instructions in affected programs: 55956 -> 54529 (-2.55%)<br> > helped: 6<br> > HURT: 38<br> ><br> > All of the hurt programs were hurt by exactly one instruction because<br> > this patch affects copy propagation. Most of the helped instructions<br> > came from a single orbital explorer shader that was helped by 14.26%<br> ><br> > Cc: <a href="mailto:mesa-stable@lists.freedesktop.org">mesa-stable@lists.freedesktop.<wbr>org</a><br> > ---<br> > src/intel/compiler/brw_cfg.cpp | 37 ++++++++++++++++++++++++++++++<wbr>+++++--<br> > 1 file changed, 35 insertions(+), 2 deletions(-)<br> ><br> > diff --git a/src/intel/compiler/brw_cfg.<wbr>cpp b/src/intel/compiler/brw_cfg.<wbr>cpp<br> > index fad12ee..d8bf725 100644<br> > --- a/src/intel/compiler/brw_cfg.<wbr>cpp<br> > +++ b/src/intel/compiler/brw_cfg.<wbr>cpp<br> > @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)<br> > assert(cur_while != NULL);<br> > cur->add_successor(mem_ctx, cur_while);<br> ><br> > + /* We also add the next block as a successor of the break. If the<br> > + * break is predicated, we need to do this because the break may not<br> > + * be taken. If the break is not predicated, we add it anyway so<br> > + * that our live intervals computations will operate as if the break<br> > + * may or may not be taken. Consider the following example:<br> > + *<br> > + * vec4 color2;<br> > + * while (1) {<br> > + * vec4 color = texture();<br> > + * if (...) {<br> > + * color2 = color * 2;<br> > + * break;<br> > + * }<br> > + * }<br> > + * gl_FragColor = color2;<br> > + *<br> > + * In this case, the definition of color2 dominates the use because<br> > + * the loop only has the one exit. This means that the live range<br> > + * interval for color2 goes from the statement in the if to it's use<br> > + * below the loop. Now suppose that the texture operation has a<br> > + * header register that gets assigned one of the registers used for<br> > + * color2. If the loop condition is non-uniform and some of the<br> > + * threads will take the break and others will continue. In this<br> > + * case, the next pass through the loop, the WE_all setup of the<br> > + * header register will stomp the disabled channels of color2 and<br> > + * corrupt the value.<br> > + *<br> > + * This same problem can occur if you have a mix of 64, 32, and<br> > + * 16-bit registers because the channels do not line up or if you<br> > + * have a SIMD16 program and the first half of one value overlaps the<br> > + * second half of the other. To solve it, we simply treat the break<br> > + * as if it may also continue on because some of the threads may<br> > + * continue on.<br> > + */<br> > next = new_block();<br> > - if (inst->predicate)<br> > - cur->add_successor(mem_ctx, next);<br> > + cur->add_successor(mem_ctx, next);<br> ><br> > set_next_block(&cur, next, ip);<br> > break;<br> > --<br> > 2.5.0.400.gff86faf<br> ><br> </div></div>> ______________________________<wbr>_________________<br> > mesa-dev mailing list<br> > <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br> > <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br> </blockquote></div><br></div></div>