[Mesa-dev] [PATCH] nir: Optimize integer division and modulus with 1

Jason Ekstrand jason at jlekstrand.net
Wed Oct 19 16:30:49 UTC 2016


On Wed, Oct 19, 2016 at 9:21 AM, Jordan Justen <jordan.l.justen at intel.com>
wrote:

> On 2016-10-19 08:58:38, Ian Romanick wrote:
> > From: Ian Romanick <ian.d.romanick at intel.com>
> >
> > The previous power-of-two rules didn't catch idiv (because i965 doesn't
> > set lower_idiv) and imod cases.  The udiv and umod cases should have
> > been caught, but I included them for orthogonality.
> >
> > This fixes silly code observed from compute shaders with local_size_[xy]
> > = 1.  This shader
>
> I would say that the benefit is easy enough to understand w/o the long
> sample in the commit message.
>

Agreed.

Reviewed-by: Jason Ekstrand <jason at jlekstrand.net>


> Reviewed-by: Jordan Justen <jordan.l.justen at intel.com>
>
> >     writeonly uniform image2D tex;
> >     layout(local_size_x = 9) in;
> >     uniform uint arg0;
> >     uniform uint arg1;
> >
> >     void main()
> >     {
> >       vec4 tmp_color;
> >       if((arg0 >= arg1))
> >         tmp_color = vec4(1.0, 1.0, 0.0, 1.0);
> >       else
> >         tmp_color = vec4(0.0, 0.0, 1.0, 1.0);
> >       ivec2 coord = ivec2(gl_GlobalInvocationID.xy);
> >       imageStore(tex, coord, tmp_color);
> >     }
> >
> > generated this code (node the divide and mod with ssa_13.x (which is 1)
> > and ssa_14.y (which is also 1).
> >
> > NIR (final form) for compute shader:
> > shader: MESA_SHADER_COMPUTE
> > name: GLSL2
> > inputs: 0
> > outputs: 0
> > uniforms: 108
> > shared: 0
> > decl_var uniform INTERP_MODE_NONE writeonly image2D tex (0, 0)
> > decl_var uniform INTERP_MODE_NONE uint arg0 (1, 96)
> > decl_var uniform INTERP_MODE_NONE uint arg1 (2, 100)
> > decl_function main returning void
> >
> > impl main {
> >         block block_0:
> >         /* preds: */
> >         vec1 32 ssa_0 = load_const (0x00000000 /* 0.000000 */)
> >         vec1 32 ssa_1 = intrinsic load_uniform (ssa_0) () (96, 4) /*
> base=96 */ /* range=4 */   /* arg0 */
> >         vec1 32 ssa_2 = intrinsic load_uniform (ssa_0) () (100, 4) /*
> base=100 */ /* range=4 */ /* arg1 */
> >         vec1 32 ssa_3 = uge ssa_1, ssa_2
> >         vec1 32 ssa_4 = load_const (0x3f800000 /* 1.000000 */)
> >         vec1 32 ssa_5 = bcsel ssa_3, ssa_4, ssa_0
> >         vec1 32 ssa_6 = bcsel ssa_3, ssa_0, ssa_4
> >         vec4 32 ssa_7 = vec4 ssa_5, ssa_5, ssa_6, ssa_4
> >         vec1 32 ssa_8 = undefined
> >         vec3 32 ssa_9 = intrinsic load_work_group_id () () ()
> >         vec1 32 ssa_10 = intrinsic load_uniform (ssa_0) () (104, 4) /*
> base=104 */ /* range=4 */
> >         vec1 32 ssa_11 = intrinsic load_channel_num () () ()
> >         vec1 32 ssa_12 = iadd ssa_11, ssa_10
> >         vec3 32 ssa_13 = load_const (0x00000001 /* 0.000000 */,
> 0x00000009 /* 0.000000 */, 0x00000009 /* 0.000000 */)
> >         vec3 32 ssa_14 = load_const (0x00000009 /* 0.000000 */,
> 0x00000001 /* 0.000000 */, 0x00000001 /* 0.000000 */)
> >         vec1 32 ssa_15 = idiv ssa_12, ssa_13.x
> >         vec1 32 ssa_16 = idiv ssa_12, ssa_13.y
> >         vec1 32 ssa_17 = imod ssa_15, ssa_14.x
> >         vec1 32 ssa_18 = imod ssa_16, ssa_14.y
> >         vec1 32 ssa_19 = imul ssa_9.x, ssa_14.x
> >         vec1 32 ssa_20 = iadd ssa_19, ssa_17
> >         vec1 32 ssa_21 = iadd ssa_9.y, ssa_18
> >         vec4 32 ssa_22 = vec4 ssa_20, ssa_21, ssa_8, ssa_8
> >         intrinsic image_store (ssa_22, ssa_8, ssa_7) (tex) ()
> >         /* succs: block_0 */
> >         block block_0:
> > }
> >
> > Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98299
> > ---
> >  src/compiler/nir/nir_opt_algebraic.py | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> > index 2de8050..82d92f4 100644
> > --- a/src/compiler/nir/nir_opt_algebraic.py
> > +++ b/src/compiler/nir/nir_opt_algebraic.py
> > @@ -66,6 +66,10 @@ optimizations = [
> >
> >     (('imul', a, '#b at 32(is_pos_power_of_two)'), ('ishl', a,
> ('find_lsb', b))),
> >     (('imul', a, '#b at 32(is_neg_power_of_two)'), ('ineg', ('ishl', a,
> ('find_lsb', ('iabs', b))))),
> > +   (('udiv', a, 1), a),
> > +   (('idiv', a, 1), a),
> > +   (('umod', a, 1), 0),
> > +   (('imod', a, 1), 0),
> >     (('udiv', a, '#b at 32(is_pos_power_of_two)'), ('ushr', a,
> ('find_lsb', b))),
> >     (('idiv', a, '#b at 32(is_pos_power_of_two)'), ('imul', ('isign', a),
> ('ushr', ('iabs', a), ('find_lsb', b))), 'options->lower_idiv'),
> >     (('idiv', a, '#b at 32(is_neg_power_of_two)'), ('ineg', ('imul',
> ('isign', a), ('ushr', ('iabs', a), ('find_lsb', ('iabs', b))))),
> 'options->lower_idiv'),
> > --
> > 2.5.5
> >
> > _______________________________________________
> > mesa-dev mailing list
> > mesa-dev at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161019/2cd7e7d4/attachment.html>


More information about the mesa-dev mailing list