<div dir="ltr"><div>This makes a lot of sense<br><br></div>Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Aug 16, 2016 at 1:54 PM, Francisco Jerez <span dir="ltr"><<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">ANY4H is more efficient than ANY8H and ANY16H because it makes sure<br>
that whenever a whole subspan hits a discard statement it gets<br>
disabled by the EU until the end of the program, regardless of whether<br>
the discard condition is uniform across all channels of the SIMD8-16<br>
thread. OTOH ANY8H/ANY16H would cause the rest of the program to be<br>
executed for *all* channels if only one of the channels hadn't taken<br>
the discard branch, potentially increasing the bandwidth and ALU usage<br>
of the program unnecessarily.<br>
<br>
This change increases the FPS by over 3x of a simple micro-benchmark<br>
that discards a bunch of fragments and then does a single costly<br>
texturing operation. I've just re-verified the FPS change on HSW and<br>
SKL, but I expect all platforms from Gen6 up to get a similar benefit.<br>
<br>
Note that we could potentially be more aggressive and use the NORMAL<br>
predicate to discard individual channels, but that would need to<br>
happen post-scheduling because the scheduler currently doesn't care to<br>
reorder HALT instructions with respect to other instructions, and the<br>
NORMAL predicate would cause the results of subsequent derivative<br>
computations to become undefined -- If the scheduler didn't reorder<br>
HALT instructions it would actually be safe to switch to NORMAL<br>
because the behavior of derivative computations after a non-uniform<br>
discard statement is undefined by the GLSL spec, but that would make<br>
the optimization implemented by one of the following commits somewhat<br>
more difficult.<br>
---<br>
src/mesa/drivers/dri/i965/brw_<wbr>fs.cpp | 4 +---<br>
1 file changed, 1 insertion(+), 3 deletions(-)<br>
<br>
diff --git a/src/mesa/drivers/dri/i965/<wbr>brw_fs.cpp b/src/mesa/drivers/dri/i965/<wbr>brw_fs.cpp<br>
index d1ac80a..c5067cd 100644<br>
--- a/src/mesa/drivers/dri/i965/<wbr>brw_fs.cpp<br>
+++ b/src/mesa/drivers/dri/i965/<wbr>brw_fs.cpp<br>
@@ -1394,9 +1394,7 @@ fs_visitor::emit_discard_jump(<wbr>)<br>
fs_inst *discard_jump = bld.emit(FS_OPCODE_DISCARD_<wbr>JUMP);<br>
discard_jump->flag_subreg = 1;<br>
<br>
- discard_jump->predicate = (dispatch_width == 8)<br>
- ? BRW_PREDICATE_ALIGN1_ANY8H<br>
- : BRW_PREDICATE_ALIGN1_ANY16H;<br>
+ discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H;<br>
discard_jump->predicate_<wbr>inverse = true;<br>
}<br>
<span class="HOEnZb"><font color="#888888"><br>
--<br>
2.9.0<br>
<br>
______________________________<wbr>_________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</font></span></blockquote></div><br></div>