[Mesa-dev] [PATCH 2/3] nir/algebraic: support for power-of-two optimizations

Rob Clark robdclark at gmail.com
Tue May 24 19:41:37 UTC 2016


On Mon, May 16, 2016 at 3:16 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
> On Mon, May 16, 2016 at 9:09 AM, Rob Clark <robdclark at gmail.com> wrote:
>>
>> On Mon, May 16, 2016 at 10:45 AM, Jason Ekstrand <jason at jlekstrand.net>
>> wrote:
>> >
>> > On May 16, 2016 7:29 AM, "Rob Clark" <robdclark at gmail.com> wrote:
>> >>
>> >> On Sat, May 14, 2016 at 4:03 PM, Jason Ekstrand <jason at jlekstrand.net>
>> >> wrote:
>> >> >
>> >> >
>> >> > On Sat, May 14, 2016 at 12:20 PM, Rob Clark <robdclark at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Thu, May 12, 2016 at 10:55 PM, Jason Ekstrand
>> >> >> <jason at jlekstrand.net>
>> >> >> wrote:
>> >> >> >
>> >> >> >
>> >> >> > On Tue, May 10, 2016 at 11:57 AM, Rob Clark <robdclark at gmail.com>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> From: Rob Clark <robclark at freedesktop.org>
>> >> >> >>
>> >> >> >> Some optimizations, like converting integer multiply/divide into
>> >> >> >> left/
>> >> >> >> right shifts, have additional constraints on the search
>> >> >> >> expression.
>> >> >> >> Like requiring that a variable is a constant power of two.
>> >> >> >> Support
>> >> >> >> these cases by allowing a fxn name to be appended to the search
>> >> >> >> var
>> >> >> >> expression (ie. "a#32(is_power_of_two)").
>> >> >> >>
>> >> >> >> TODO update doc/comment explaining search var syntax
>> >> >> >> TODO the eagle-eyed viewer might have noticed that this could
>> >> >> >> also
>> >> >> >> replace the existing const syntax (ie. "#a").  Not sure if we
>> >> >> >> should
>> >> >> >> keep that.. we could make it syntactic sugar (ie '#'
>> >> >> >> automatically
>> >> >> >> sets
>> >> >> >> the cond fxn ptr to 'is_const') or just get rid of it entirely?
>> >> >> >> Maybe
>> >> >> >> that is a follow-on clean-up patch?
>> >> >> >>
>> >> >> >> Signed-off-by: Rob Clark <robclark at freedesktop.org>
>> >> >> >> ---
>> >> >> >>  src/compiler/nir/nir_algebraic.py     |  8 +++--
>> >> >> >>  src/compiler/nir/nir_opt_algebraic.py |  5 +++
>> >> >> >>  src/compiler/nir/nir_search.c         |  3 ++
>> >> >> >>  src/compiler/nir/nir_search.h         | 10 ++++++
>> >> >> >>  src/compiler/nir/nir_search_helpers.h | 66
>> >> >> >> +++++++++++++++++++++++++++++++++++
>> >> >> >>  5 files changed, 90 insertions(+), 2 deletions(-)
>> >> >> >>  create mode 100644 src/compiler/nir/nir_search_helpers.h
>> >> >> >>
>> >> >> >> diff --git a/src/compiler/nir/nir_algebraic.py
>> >> >> >> b/src/compiler/nir/nir_algebraic.py
>> >> >> >> index 285f853..19ac6ee 100644
>> >> >> >> --- a/src/compiler/nir/nir_algebraic.py
>> >> >> >> +++ b/src/compiler/nir/nir_algebraic.py
>> >> >> >> @@ -76,6 +76,7 @@ class Value(object):
>> >> >> >>           return Constant(val, name_base)
>> >> >> >>
>> >> >> >>     __template = mako.template.Template("""
>> >> >> >> +#include "compiler/nir/nir_search_helpers.h"
>> >> >> >>  static const ${val.c_type} ${val.name} = {
>> >> >> >>     { ${val.type_enum}, ${val.bit_size} },
>> >> >> >>  % if isinstance(val, Constant):
>> >> >> >> @@ -84,6 +85,7 @@ static const ${val.c_type} ${val.name} = {
>> >> >> >>     ${val.index}, /* ${val.var_name} */
>> >> >> >>     ${'true' if val.is_constant else 'false'},
>> >> >> >>     ${val.type() or 'nir_type_invalid' },
>> >> >> >> +   ${val.cond if val.cond else 'NULL'},
>> >> >> >>  % elif isinstance(val, Expression):
>> >> >> >>     ${'true' if val.inexact else 'false'},
>> >> >> >>     nir_op_${val.opcode},
>> >> >> >> @@ -113,7 +115,7 @@ static const ${val.c_type} ${val.name} = {
>> >> >> >>                                      Variable=Variable,
>> >> >> >>                                      Expression=Expression)
>> >> >> >>
>> >> >> >> -_constant_re =
>> >> >> >> re.compile(r"(?P<value>[^@]+)(?:@(?P<bits>\d+))?")
>> >> >> >> +_constant_re =
>> >> >> >> re.compile(r"(?P<value>[^@\(]+)(?:@(?P<bits>\d+))?")
>> >> >> >
>> >> >> >
>> >> >> > Spurious change?
>> >> >> >
>> >> >>
>> >> >> I thought it needed to avoid matching something like
>> >> >> a(is_power_of_two).. but it seems to work with that hunk reverted so
>> >> >> I
>> >> >> guess I can drop it..
>> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>  class Constant(Value):
>> >> >> >>     def __init__(self, val, name):
>> >> >> >> @@ -150,7 +152,8 @@ class Constant(Value):
>> >> >> >>           return "nir_type_float"
>> >> >> >>
>> >> >> >>  _var_name_re = re.compile(r"(?P<const>#)?(?P<name>\w+)"
>> >> >> >> -
>> >> >> >> r"(?:@(?P<type>int|uint|bool|float)?(?P<bits>\d+)?)?")
>> >> >> >> +
>> >> >> >> r"(?:@(?P<type>int|uint|bool|float)?(?P<bits>\d+)?)?"
>> >> >> >> +                          r"(?P<cond>\([^\)]+\))?")
>> >> >> >>
>> >> >> >>  class Variable(Value):
>> >> >> >>     def __init__(self, val, name, varset):
>> >> >> >> @@ -161,6 +164,7 @@ class Variable(Value):
>> >> >> >>
>> >> >> >>        self.var_name = m.group('name')
>> >> >> >>        self.is_constant = m.group('const') is not None
>> >> >> >> +      self.cond = m.group('cond')
>> >> >> >>        self.required_type = m.group('type')
>> >> >> >>        self.bit_size = int(m.group('bits')) if m.group('bits')
>> >> >> >> else
>> >> >> >> 0
>> >> >> >>
>> >> >> >> diff --git a/src/compiler/nir/nir_opt_algebraic.py
>> >> >> >> b/src/compiler/nir/nir_opt_algebraic.py
>> >> >> >> index 0a95725..952a91a 100644
>> >> >> >> --- a/src/compiler/nir/nir_opt_algebraic.py
>> >> >> >> +++ b/src/compiler/nir/nir_opt_algebraic.py
>> >> >> >> @@ -62,6 +62,11 @@ d = 'd'
>> >> >> >>  # constructed value should have that bit-size.
>> >> >> >>
>> >> >> >>  optimizations = [
>> >> >> >> +
>> >> >> >> +   (('imul', a, '#b at 32(is_power_of_two)'), ('ishl', a,
>> >> >> >> ('find_lsb',
>> >> >> >> b))),
>> >> >> >> +   (('udiv', a, '#b at 32(is_power_of_two)'), ('ushr', a,
>> >> >> >> ('find_lsb',
>> >> >> >> b))),
>> >> >> >> +   (('umod', a, '#b(is_power_of_two)'),    ('iand', a, ('isub',
>> >> >> >> b,
>> >> >> >> 1))),
>> >> >> >> +
>> >> >> >>     (('fneg', ('fneg', a)), a),
>> >> >> >>     (('ineg', ('ineg', a)), a),
>> >> >> >>     (('fabs', ('fabs', a)), ('fabs', a)),
>> >> >> >> diff --git a/src/compiler/nir/nir_search.c
>> >> >> >> b/src/compiler/nir/nir_search.c
>> >> >> >> index 2c2fd92..b21fb2c 100644
>> >> >> >> --- a/src/compiler/nir/nir_search.c
>> >> >> >> +++ b/src/compiler/nir/nir_search.c
>> >> >> >> @@ -127,6 +127,9 @@ match_value(const nir_search_value *value,
>> >> >> >> nir_alu_instr *instr, unsigned src,
>> >> >> >>               instr->src[src].src.ssa->parent_instr->type !=
>> >> >> >> nir_instr_type_load_const)
>> >> >> >>              return false;
>> >> >> >>
>> >> >> >> +         if (var->cond && !var->cond(instr, src, num_components,
>> >> >> >> new_swizzle))
>> >> >> >> +            return false;
>> >> >> >> +
>> >> >> >>           if (var->type != nir_type_invalid) {
>> >> >> >>              if (instr->src[src].src.ssa->parent_instr->type !=
>> >> >> >> nir_instr_type_alu)
>> >> >> >>                 return false;
>> >> >> >> diff --git a/src/compiler/nir/nir_search.h
>> >> >> >> b/src/compiler/nir/nir_search.h
>> >> >> >> index a500feb..f55d797 100644
>> >> >> >> --- a/src/compiler/nir/nir_search.h
>> >> >> >> +++ b/src/compiler/nir/nir_search.h
>> >> >> >> @@ -68,6 +68,16 @@ typedef struct {
>> >> >> >>      * never match anything.
>> >> >> >>      */
>> >> >> >>     nir_alu_type type;
>> >> >> >> +
>> >> >> >> +   /** Optional condition fxn ptr
>> >> >> >> +    *
>> >> >> >> +    * This is only allowed in search expressions, and allows
>> >> >> >> additional
>> >> >> >> +    * constraints to be placed on the match.  Typically used for
>> >> >> >> 'is_constant'
>> >> >> >> +    * variables to require, for example, power-of-two in order
>> >> >> >> for
>> >> >> >> the
>> >> >> >> search
>> >> >> >> +    * to match.
>> >> >> >> +    */
>> >> >> >> +   bool (*cond)(nir_alu_instr *instr, unsigned src,
>> >> >> >> +                unsigned num_components, const uint8_t
>> >> >> >> *swizzle);
>> >> >> >>  } nir_search_variable;
>> >> >> >>
>> >> >> >>  typedef struct {
>> >> >> >> diff --git a/src/compiler/nir/nir_search_helpers.h
>> >> >> >> b/src/compiler/nir/nir_search_helpers.h
>> >> >> >> new file mode 100644
>> >> >> >> index 0000000..8ed2ca0
>> >> >> >> --- /dev/null
>> >> >> >> +++ b/src/compiler/nir/nir_search_helpers.h
>> >> >> >> @@ -0,0 +1,66 @@
>> >> >> >> +/*
>> >> >> >> + * Copyright © 2016 Red Hat
>> >> >> >> + *
>> >> >> >> + * Permission is hereby granted, free of charge, to any person
>> >> >> >> obtaining
>> >> >> >> a
>> >> >> >> + * copy of this software and associated documentation files (the
>> >> >> >> "Software"),
>> >> >> >> + * to deal in the Software without restriction, including
>> >> >> >> without
>> >> >> >> limitation
>> >> >> >> + * the rights to use, copy, modify, merge, publish, distribute,
>> >> >> >> sublicense,
>> >> >> >> + * and/or sell copies of the Software, and to permit persons to
>> >> >> >> whom
>> >> >> >> the
>> >> >> >> + * Software is furnished to do so, subject to the following
>> >> >> >> conditions:
>> >> >> >> + *
>> >> >> >> + * The above copyright notice and this permission notice
>> >> >> >> (including
>> >> >> >> the
>> >> >> >> next
>> >> >> >> + * paragraph) shall be included in all copies or substantial
>> >> >> >> portions
>> >> >> >> of
>> >> >> >> the
>> >> >> >> + * Software.
>> >> >> >> + *
>> >> >> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
>> >> >> >> KIND,
>> >> >> >> EXPRESS OR
>> >> >> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> >> >> >> MERCHANTABILITY,
>> >> >> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
>> >> >> >> EVENT
>> >> >> >> SHALL
>> >> >> >> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
>> >> >> >> DAMAGES
>> >> >> >> OR
>> >> >> >> OTHER
>> >> >> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>> >> >> >> OTHERWISE,
>> >> >> >> ARISING
>> >> >> >> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> >> >> >> OTHER
>> >> >> >> DEALINGS
>> >> >> >> + * IN THE SOFTWARE.
>> >> >> >> + *
>> >> >> >> + * Authors:
>> >> >> >> + *    Rob Clark <robclark at freedesktop.org>
>> >> >> >> + */
>> >> >> >> +
>> >> >> >> +#ifndef _NIR_SEARCH_HELPERS_
>> >> >> >> +#define _NIR_SEARCH_HELPERS_
>> >> >> >> +
>> >> >> >> +#include "nir.h"
>> >> >> >> +
>> >> >> >> +static inline bool
>> >> >> >> +__is_power_of_two(unsigned int x)
>> >> >> >> +{
>> >> >> >> +   return ((x != 0) && !(x & (x - 1)));
>> >> >> >> +}
>> >> >> >> +
>> >> >> >> +static inline bool
>> >> >> >> +is_power_of_two(nir_alu_instr *instr, unsigned src, unsigned
>> >> >> >> num_components,
>> >> >> >> +                const uint8_t *swizzle)
>> >> >> >> +{
>> >> >> >> +   nir_const_value *val =
>> >> >> >> nir_src_as_const_value(instr->src[src].src);
>> >> >> >> +
>> >> >> >> +   /* only constant src's: */
>> >> >> >> +   if (!val)
>> >> >> >> +      return false;
>> >> >> >> +
>> >> >> >> +   for (unsigned i = 0; i < num_components; i++) {
>> >> >> >> +      switch (nir_op_infos[instr->op].input_types[src]) {
>> >> >> >> +      case nir_type_int:
>> >> >> >> +         if (!__is_power_of_two(val->i32[swizzle[i]]))
>> >> >> >> +            return false;
>> >> >> >> +         break;
>> >> >> >> +      case nir_type_uint:
>> >> >> >> +         if (!__is_power_of_two(val->u32[swizzle[i]]))
>> >> >> >> +            return false;
>> >> >> >
>> >> >> >
>> >> >> > Your implementation of __is_power_of_two takes an unsigned.
>> >> >> > There's
>> >> >> > no
>> >> >> > benefit to splitting it into two cases and it just creates a false
>> >> >> > distinction.  If you do, you can probably inline the helper (I
>> >> >> > don't
>> >> >> > care
>> >> >> > much one way or the other on that).
>> >> >>
>> >> >> hmm, are you sure about negative PoT?  Although possibly that should
>> >> >> be '__is_power_of_two(ABS(val->i32[..])'?
>> >> >
>> >> >
>> >> > I don't know off-hand whether or not it's correct for negative
>> >> > power-of-two.
>> >> > However, I do know that it does exactly the same thing in both cases
>> >> > regardless of whether or not the first one is correct :-)
>> >> > --Jason
>> >>
>> >> so, for the record, __is_power_of_two(ABS(val->i32[..])) seems to be
>> >> the correct thing to do.  That would let us add a (presumably
>> >> optional) rule to lower idiv to ushr plus a multiply by the isign of
>> >> each of the args..
>> >
>> > Unfortunately, idiv can never be lowered because -1 >> 1 == -1.  Yeah,
>> > that's annoying.
>> >
>> > Also, I don't think any of the optimizations you've added here works for
>> > negative numbers without a sign-flip.  We would need a pair of
>> > optimizations: one for non-negative powers of two and one for negative.
>>
>> so pretty sure this isn't an issue, since (udiv, a,
>> #b(is_power_of_two)) hits the nir_type_uint case in is_power_of_two()
>> and doesn't recognize -2 (for example) as a PoT..
>>
>> possibly we could do a bit better, I think we'd have to split
>> is_power_of_two() into is_neg_power_of_two() and
>> is_pos_power_of_two()..
>
>
> I like that plan
>

ok, for what it's worth, my current version of this patch has:

   (('imul', a, '#b at 32(is_pos_power_of_two)'), ('ishl', a, ('find_lsb', b))),
   (('imul', a, '#b at 32(is_neg_power_of_two)'), ('ineg', ('ishl', a,
('find_lsb', ('iabs', b))))),
   (('udiv', a, '#b at 32(is_pos_power_of_two)'), ('ushr', a, ('find_lsb', b))),
   (('idiv', a, '#b at 32(is_pos_power_of_two)'), ('imul', ('isign', a),
('ushr', ('iabs', a), ('find_lsb', b))), 'options->lower_idiv'),
   (('idiv', a, '#b at 32(is_neg_power_of_two)'), ('ineg', ('imul',
('isign', a), ('ushr', ('iabs', a), ('find_lsb', ('iabs', b))))),
'options->lower_idiv'),
   (('umod', a, '#b(is_pos_power_of_two)'),    ('iand', a, ('isub', b, 1))),

In theory this should be wrong, because (iabs MIN_INT) -> MIN_INT..

but things are a bit strange..  I hacked up a .shader_test:

----
[require]
GLSL >= 1.30

[vertex shader]
void main()
{
        gl_Position = gl_Vertex;
}

[fragment shader]
uniform int arg0;
uniform int expected;

void main()
{
  int result = (arg0 / 2);
  gl_FragColor = result == expected ? vec4(0.0, 1.0, 0.0, 1.0) :
vec4(1.0, 0.0, 0.0, 1.0);
}

[test]
clear color 0.0 0.0 1.0 0.0
clear
uniform int arg0 0x80000000
uniform int expected 0xc0000000
draw rect ortho 0 0 4 4
probe rect rgba (0, 0, 4, 4) (0.0, 1.0, 0.0, 1.0)
----

oddly (even w/out opt-algebraic rules), it fails for
llvmpipe/softpipe/freedreno on armv7.  But passes for
llvmpipe/softpipe/i965 on x86.  Not entirely sure what to make of
that.

Now the really odd thing is if I enable the opt rule (and I can see
from INTEL_DEBUG=fs.. http://hastebin.com/meyekajake.md), that the
test still passes.  Not entirely sure how that is possible.

I'll have to rig up some test program on arm to see what is really
going on, but since it effects softpipe and llvmpipe I guess it is
unrelated.

BR,
-R


More information about the mesa-dev mailing list