<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Oct 19, 2016 at 9:21 AM, Jordan Justen <span dir="ltr"><<a href="mailto:jordan.l.justen@intel.com" target="_blank">jordan.l.justen@intel.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 2016-10-19 08:58:38, Ian Romanick wrote:<br>
> From: Ian Romanick <<a href="mailto:ian.d.romanick@intel.com">ian.d.romanick@intel.com</a>><br>
><br>
> The previous power-of-two rules didn't catch idiv (because i965 doesn't<br>
> set lower_idiv) and imod cases. The udiv and umod cases should have<br>
> been caught, but I included them for orthogonality.<br>
><br>
> This fixes silly code observed from compute shaders with local_size_[xy]<br>
> = 1. This shader<br>
<br>
</span>I would say that the benefit is easy enough to understand w/o the long<br>
sample in the commit message.<br></blockquote><div><br></div><div>Agreed.<br><br></div><div>Reviewed-by: Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Reviewed-by: Jordan Justen <<a href="mailto:jordan.l.justen@intel.com">jordan.l.justen@intel.com</a>><br>
<div class="HOEnZb"><div class="h5"><br>
> writeonly uniform image2D tex;<br>
> layout(local_size_x = 9) in;<br>
> uniform uint arg0;<br>
> uniform uint arg1;<br>
><br>
> void main()<br>
> {<br>
> vec4 tmp_color;<br>
> if((arg0 >= arg1))<br>
> tmp_color = vec4(1.0, 1.0, 0.0, 1.0);<br>
> else<br>
> tmp_color = vec4(0.0, 0.0, 1.0, 1.0);<br>
> ivec2 coord = ivec2(gl_GlobalInvocationID.<wbr>xy);<br>
> imageStore(tex, coord, tmp_color);<br>
> }<br>
><br>
> generated this code (node the divide and mod with ssa_13.x (which is 1)<br>
> and ssa_14.y (which is also 1).<br>
><br>
> NIR (final form) for compute shader:<br>
> shader: MESA_SHADER_COMPUTE<br>
> name: GLSL2<br>
> inputs: 0<br>
> outputs: 0<br>
> uniforms: 108<br>
> shared: 0<br>
> decl_var uniform INTERP_MODE_NONE writeonly image2D tex (0, 0)<br>
> decl_var uniform INTERP_MODE_NONE uint arg0 (1, 96)<br>
> decl_var uniform INTERP_MODE_NONE uint arg1 (2, 100)<br>
> decl_function main returning void<br>
><br>
> impl main {<br>
> block block_0:<br>
> /* preds: */<br>
> vec1 32 ssa_0 = load_const (0x00000000 /* 0.000000 */)<br>
> vec1 32 ssa_1 = intrinsic load_uniform (ssa_0) () (96, 4) /* base=96 */ /* range=4 */ /* arg0 */<br>
> vec1 32 ssa_2 = intrinsic load_uniform (ssa_0) () (100, 4) /* base=100 */ /* range=4 */ /* arg1 */<br>
> vec1 32 ssa_3 = uge ssa_1, ssa_2<br>
> vec1 32 ssa_4 = load_const (0x3f800000 /* 1.000000 */)<br>
> vec1 32 ssa_5 = bcsel ssa_3, ssa_4, ssa_0<br>
> vec1 32 ssa_6 = bcsel ssa_3, ssa_0, ssa_4<br>
> vec4 32 ssa_7 = vec4 ssa_5, ssa_5, ssa_6, ssa_4<br>
> vec1 32 ssa_8 = undefined<br>
> vec3 32 ssa_9 = intrinsic load_work_group_id () () ()<br>
> vec1 32 ssa_10 = intrinsic load_uniform (ssa_0) () (104, 4) /* base=104 */ /* range=4 */<br>
> vec1 32 ssa_11 = intrinsic load_channel_num () () ()<br>
> vec1 32 ssa_12 = iadd ssa_11, ssa_10<br>
> vec3 32 ssa_13 = load_const (0x00000001 /* 0.000000 */, 0x00000009 /* 0.000000 */, 0x00000009 /* 0.000000 */)<br>
> vec3 32 ssa_14 = load_const (0x00000009 /* 0.000000 */, 0x00000001 /* 0.000000 */, 0x00000001 /* 0.000000 */)<br>
> vec1 32 ssa_15 = idiv ssa_12, ssa_13.x<br>
> vec1 32 ssa_16 = idiv ssa_12, ssa_13.y<br>
> vec1 32 ssa_17 = imod ssa_15, ssa_14.x<br>
> vec1 32 ssa_18 = imod ssa_16, ssa_14.y<br>
> vec1 32 ssa_19 = imul ssa_9.x, ssa_14.x<br>
> vec1 32 ssa_20 = iadd ssa_19, ssa_17<br>
> vec1 32 ssa_21 = iadd ssa_9.y, ssa_18<br>
> vec4 32 ssa_22 = vec4 ssa_20, ssa_21, ssa_8, ssa_8<br>
> intrinsic image_store (ssa_22, ssa_8, ssa_7) (tex) ()<br>
> /* succs: block_0 */<br>
> block block_0:<br>
> }<br>
><br>
> Signed-off-by: Ian Romanick <<a href="mailto:ian.d.romanick@intel.com">ian.d.romanick@intel.com</a>><br>
> Bugzilla: <a href="https://bugs.freedesktop.org/show_bug.cgi?id=98299" rel="noreferrer" target="_blank">https://bugs.freedesktop.org/<wbr>show_bug.cgi?id=98299</a><br>
> ---<br>
> src/compiler/nir/nir_opt_<wbr>algebraic.py | 4 ++++<br>
> 1 file changed, 4 insertions(+)<br>
><br>
> diff --git a/src/compiler/nir/nir_opt_<wbr>algebraic.py b/src/compiler/nir/nir_opt_<wbr>algebraic.py<br>
> index 2de8050..82d92f4 100644<br>
> --- a/src/compiler/nir/nir_opt_<wbr>algebraic.py<br>
> +++ b/src/compiler/nir/nir_opt_<wbr>algebraic.py<br>
> @@ -66,6 +66,10 @@ optimizations = [<br>
><br>
> (('imul', a, '#b@32(is_pos_power_of_two)'), ('ishl', a, ('find_lsb', b))),<br>
> (('imul', a, '#b@32(is_neg_power_of_two)'), ('ineg', ('ishl', a, ('find_lsb', ('iabs', b))))),<br>
> + (('udiv', a, 1), a),<br>
> + (('idiv', a, 1), a),<br>
> + (('umod', a, 1), 0),<br>
> + (('imod', a, 1), 0),<br>
> (('udiv', a, '#b@32(is_pos_power_of_two)'), ('ushr', a, ('find_lsb', b))),<br>
> (('idiv', a, '#b@32(is_pos_power_of_two)'), ('imul', ('isign', a), ('ushr', ('iabs', a), ('find_lsb', b))), 'options->lower_idiv'),<br>
> (('idiv', a, '#b@32(is_neg_power_of_two)'), ('ineg', ('imul', ('isign', a), ('ushr', ('iabs', a), ('find_lsb', ('iabs', b))))), 'options->lower_idiv'),<br>
> --<br>
> 2.5.5<br>
><br>
> ______________________________<wbr>_________________<br>
> mesa-dev mailing list<br>
> <a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
> <a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
______________________________<wbr>_________________<br>
mesa-dev mailing list<br>
<a href="mailto:mesa-dev@lists.freedesktop.org">mesa-dev@lists.freedesktop.org</a><br>
<a href="https://lists.freedesktop.org/mailman/listinfo/mesa-dev" rel="noreferrer" target="_blank">https://lists.freedesktop.org/<wbr>mailman/listinfo/mesa-dev</a><br>
</div></div></blockquote></div><br></div></div>