<div dir="ltr"><div>This makes a lot of sense </div>Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> </div><div class="gmail_extra"> <div class="gmail_quote">On Tue, Aug 16, 2016 at 1:54 PM, Francisco Jerez <<a href="mailto:currojerez@riseup.net" target="_blank">currojerez@riseup.net</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">ANY4H is more efficient than ANY8H and ANY16H because it makes sure that whenever a whole subspan hits a discard statement it gets disabled by the EU until the end of the program, regardless of whether the discard condition is uniform across all channels of the SIMD8-16 thread. OTOH ANY8H/ANY16H would cause the rest of the program to be executed for *all* channels if only one of the channels hadn't taken the discard branch, potentially increasing the bandwidth and ALU usage of the program unnecessarily. This change increases the FPS by over 3x of a simple micro-benchmark that discards a bunch of fragments and then does a single costly texturing operation. I've just re-verified the FPS change on HSW and SKL, but I expect all platforms from Gen6 up to get a similar benefit. Note that we could potentially be more aggressive and use the NORMAL predicate to discard individual channels, but that would need to happen post-scheduling because the scheduler currently doesn't care to reorder HALT instructions with respect to other instructions, and the NORMAL predicate would cause the results of subsequent derivative computations to become undefined -- If the scheduler didn't reorder HALT instructions it would actually be safe to switch to NORMAL because the behavior of derivative computations after a non-uniform discard statement is undefined by the GLSL spec, but that would make the optimization implemented by one of the following commits somewhat more difficult. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index d1ac80a..c5067cd 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -1394,9 +1394,7 @@ fs_visitor::emit_discard_jump() fs_inst *discard_jump = bld.emit(FS_OPCODE_DISCARD_JUMP); discard_jump->flag_subreg = 1; - discard_jump->predicate = (dispatch_width == 8) - ? BRW_PREDICATE_ALIGN1_ANY8H - : BRW_PREDICATE_ALIGN1_ANY16H; + discard_jump->predicate = BRW_PREDICATE_ALIGN1_ANY4H; discard_jump->predicate_inverse = true; } -- 2.9.0 _______________________________________________ mesa-dev mailing list <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/mailman/listinfo/mesa-dev</a> </blockquote></div> </div>