<div dir="ltr">New patch on the list.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Oct 4, 2017 at 7:42 PM, Jason Ekstrand <span dir="ltr"><<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div><div class="h5">On Wed, Oct 4, 2017 at 5:35 PM, Jason Ekstrand <span dir="ltr"><<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><span>On Wed, Oct 4, 2017 at 5:29 PM, Connor Abbott <span dir="ltr"><<a href="mailto:cwabbott0@gmail.com" target="_blank">cwabbott0@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This won't completely solve the problem. For example, what if you<br>
hoist the assignment to color2 outside the loop?<br>
<br>
vec4 color2;<br>
while (1) {<br>
vec4 color = texture();<br>
<span> color2 = color * 2;<br>
</span> if (...) {<br>
break;<br>
}<br>
}<br>
gl_FragColor = color2;<br>
<br>
<br>
Now the definition still dominates the use, even with the modified<br>
control-flow graph, and you have the same problem</blockquote><div><br></div></span><div>Curro had me convinced that some detail of the liveness analysis pass saved us here but now I can't remember what. :-( <br></div><span><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> The real problem is<br>
that the assignment to color2 is really a conditional assignment: if<br>
we're going channel-by-channel, it's not, but if you consider the<br>
*whole* register at the same time, it is. To really fix the problem,<br>
you need to model exactly what the machine actually does: you need to<br>
insert "fake" edges like these, that model the jumps that the machine<br>
can take, and you need to make every assignment a conditional<br>
assignment (i.e. it doesn't kill the register). It's probably not as<br>
bad with Curro's patch on top, though. Also, once you do this you can<br>
make register allocation more accurate by generating interferences<br>
from the liveness information directly instead of from the intervals.<br>
<br>
One thing I've thought about is, in addition to maintaining this<br>
"whole-vector" view of things, is to maintain a "per-channel" liveness<br>
that doesn't use the extra edges, partial definitions etc. and then<br>
use the "per-channel view" to calculate interference when the channels<br>
always line up.<br><div><div class="m_-2621970853118662515m_-5522118626808924054h5"></div></div></blockquote><div><br></div></span><div>Yes, we've considered that and it's a good idea. However, I'm trying to fix bugs right now, not write the world's best liveness analysis pass. :-)<br></div></div></div></div></blockquote><div><br></div></div></div><div>You're correct, as usual... I've inspected the result of liveness anlaysis and we do indeed get it wrong. I'll come up with something less bogus.<br></div><div><div class="h5"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div></div><div><div class="m_-2621970853118662515h5"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div class="m_-2621970853118662515m_-5522118626808924054h5">
On Wed, Oct 4, 2017 at 7:58 PM, Jason Ekstrand <<a href="mailto:jason@jlekstrand.net" target="_blank">jason@jlekstrand.net</a>> wrote:<br>
> Shader-db results on Sky Lake:<br>
><br>
> total instructions in shared programs: 12955125 -> 12953698 (-0.01%)<br>
> instructions in affected programs: 55956 -> 54529 (-2.55%)<br>
> helped: 6<br>
> HURT: 38<br>
><br>
> All of the hurt programs were hurt by exactly one instruction because<br>
> this patch affects copy propagation. Most of the helped instructions<br>
> came from a single orbital explorer shader that was helped by 14.26%<br>
><br>
> Cc: <a href="mailto:mesa-stable@lists.freedesktop.org" target="_blank">mesa-stable@lists.freedesktop.<wbr>org</a><br>
> ---<br>
> src/intel/compiler/brw_cfg.cpp | 37 ++++++++++++++++++++++++++++++<wbr>+++++--<br>
> 1 file changed, 35 insertions(+), 2 deletions(-)<br>
><br>
> diff --git a/src/intel/compiler/brw_cfg.c<wbr>pp b/src/intel/compiler/brw_cfg.c<wbr>pp<br>
> index fad12ee..d8bf725 100644<br>
> --- a/src/intel/compiler/brw_cfg.c<wbr>pp<br>
> +++ b/src/intel/compiler/brw_cfg.c<wbr>pp<br>
> @@ -289,9 +289,42 @@ cfg_t::cfg_t(exec_list *instructions)<br>
> assert(cur_while != NULL);<br>
> cur->add_successor(mem_ctx, cur_while);<br>
><br>
> + /* We also add the next block as a successor of the break. If the<br>
> + * break is predicated, we need to do this because the break may not<br>
> + * be taken. If the break is not predicated, we add it anyway so<br>
> + * that our live intervals computations will operate as if the break<br>
> + * may or may not be taken. Consider the following example:<br>
> + *<br>
> + * vec4 color2;<br>
> + * while (1) {<br>
> + * vec4 color = texture();<br>
> + * if (...) {<br>
> + * color2 = color * 2;<br>
> + * break;<br>
> + * }<br>
> + * }<br>
> + * gl_FragColor = color2;<br>
> + *<br>
> + * In this case, the definition of color2 dominates the use because<br>
> + * the loop only has the one exit. This means that the live range<br>
> + * interval for color2 goes from the statement in the if to it's use<br>
> + * below the loop. Now suppose that the texture operation has a<br>
> + * header register that gets assigned one of the registers used for<br>
> + * color2. If the loop condition is non-uniform and some of the<br>
> + * threads will take the break and others will continue. In this<br>
> + * case, the next pass through the loop, the WE_all setup of the<br>
> + * header register will stomp the disabled channels of color2 and<br>
> + * corrupt the value.<br>
> + *<br>
> + * This same problem can occur if you have a mix of 64, 32, and<br>
> + * 16-bit registers because the channels do not line up or if you<br>
> + * have a SIMD16 program and the first half of one value overlaps the<br>
> + * second half of the other. To solve it, we simply treat the break<br>
> + * as if it may also continue on because some of the threads may<br>
> + * continue on.<br>
> + */<br>
> next = new_block();<br>
> - if (inst->predicate)<br>
> - cur->add_successor(mem_ctx, next);<br>
> + cur->add_successor(mem_ctx, next);<br>
><br>
> set_next_block(&cur, next, ip);<br>
> break;<br>
> --<br>
> 2.5.0.400.gff86faf<br>
><br>
</div></div>> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org" target="_blank">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</blockquote></div></div></div><br></div></div>
</blockquote></div></div></div><br></div></div>
</blockquote></div><br></div>