[Mesa-dev] [PATCH 2/3] nir/algebraic: support for power-of-two optimizations
Rob Clark
robdclark at gmail.com
Tue May 24 20:54:02 UTC 2016
On Tue, May 24, 2016 at 3:41 PM, Rob Clark <robdclark at gmail.com> wrote:
> On Mon, May 16, 2016 at 3:16 PM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Mon, May 16, 2016 at 9:09 AM, Rob Clark <robdclark at gmail.com> wrote:
>>>
>>> On Mon, May 16, 2016 at 10:45 AM, Jason Ekstrand <jason at jlekstrand.net>
>>> wrote:
>>> >
>>> > On May 16, 2016 7:29 AM, "Rob Clark" <robdclark at gmail.com> wrote:
>>> >>
>>> >> On Sat, May 14, 2016 at 4:03 PM, Jason Ekstrand <jason at jlekstrand.net>
>>> >> wrote:
>>> >> >
>>> >> >
>>> >> > On Sat, May 14, 2016 at 12:20 PM, Rob Clark <robdclark at gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> On Thu, May 12, 2016 at 10:55 PM, Jason Ekstrand
>>> >> >> <jason at jlekstrand.net>
>>> >> >> wrote:
>>> >> >> >
>>> >> >> >
>>> >> >> > On Tue, May 10, 2016 at 11:57 AM, Rob Clark <robdclark at gmail.com>
>>> >> >> > wrote:
>>> >> >> >>
>>> >> >> >> From: Rob Clark <robclark at freedesktop.org>
>>> >> >> >>
>>> >> >> >> Some optimizations, like converting integer multiply/divide into
>>> >> >> >> left/
>>> >> >> >> right shifts, have additional constraints on the search
>>> >> >> >> expression.
>>> >> >> >> Like requiring that a variable is a constant power of two.
>>> >> >> >> Support
>>> >> >> >> these cases by allowing a fxn name to be appended to the search
>>> >> >> >> var
>>> >> >> >> expression (ie. "a#32(is_power_of_two)").
>>> >> >> >>
>>> >> >> >> TODO update doc/comment explaining search var syntax
>>> >> >> >> TODO the eagle-eyed viewer might have noticed that this could
>>> >> >> >> also
>>> >> >> >> replace the existing const syntax (ie. "#a"). Not sure if we
>>> >> >> >> should
>>> >> >> >> keep that.. we could make it syntactic sugar (ie '#'
>>> >> >> >> automatically
>>> >> >> >> sets
>>> >> >> >> the cond fxn ptr to 'is_const') or just get rid of it entirely?
>>> >> >> >> Maybe
>>> >> >> >> that is a follow-on clean-up patch?
>>> >> >> >>
>>> >> >> >> Signed-off-by: Rob Clark <robclark at freedesktop.org>
>>> >> >> >> ---
>>> >> >> >> src/compiler/nir/nir_algebraic.py | 8 +++--
>>> >> >> >> src/compiler/nir/nir_opt_algebraic.py | 5 +++
>>> >> >> >> src/compiler/nir/nir_search.c | 3 ++
>>> >> >> >> src/compiler/nir/nir_search.h | 10 ++++++
>>> >> >> >> src/compiler/nir/nir_search_helpers.h | 66
>>> >> >> >> +++++++++++++++++++++++++++++++++++
>>> >> >> >> 5 files changed, 90 insertions(+), 2 deletions(-)
>>> >> >> >> create mode 100644 src/compiler/nir/nir_search_helpers.h
>>> >> >> >>
>>> >> >> >> diff --git a/src/compiler/nir/nir_algebraic.py
>>> >> >> >> b/src/compiler/nir/nir_algebraic.py
>>> >> >> >> index 285f853..19ac6ee 100644
>>> >> >> >> --- a/src/compiler/nir/nir_algebraic.py
>>> >> >> >> +++ b/src/compiler/nir/nir_algebraic.py
>>> >> >> >> @@ -76,6 +76,7 @@ class Value(object):
>>> >> >> >> return Constant(val, name_base)
>>> >> >> >>
>>> >> >> >> __template = mako.template.Template("""
>>> >> >> >> +#include "compiler/nir/nir_search_helpers.h"
>>> >> >> >> static const ${val.c_type} ${val.name} = {
>>> >> >> >> { ${val.type_enum}, ${val.bit_size} },
>>> >> >> >> % if isinstance(val, Constant):
>>> >> >> >> @@ -84,6 +85,7 @@ static const ${val.c_type} ${val.name} = {
>>> >> >> >> ${val.index}, /* ${val.var_name} */
>>> >> >> >> ${'true' if val.is_constant else 'false'},
>>> >> >> >> ${val.type() or 'nir_type_invalid' },
>>> >> >> >> + ${val.cond if val.cond else 'NULL'},
>>> >> >> >> % elif isinstance(val, Expression):
>>> >> >> >> ${'true' if val.inexact else 'false'},
>>> >> >> >> nir_op_${val.opcode},
>>> >> >> >> @@ -113,7 +115,7 @@ static const ${val.c_type} ${val.name} = {
>>> >> >> >> Variable=Variable,
>>> >> >> >> Expression=Expression)
>>> >> >> >>
>>> >> >> >> -_constant_re =
>>> >> >> >> re.compile(r"(?P<value>[^@]+)(?:@(?P<bits>\d+))?")
>>> >> >> >> +_constant_re =
>>> >> >> >> re.compile(r"(?P<value>[^@\(]+)(?:@(?P<bits>\d+))?")
>>> >> >> >
>>> >> >> >
>>> >> >> > Spurious change?
>>> >> >> >
>>> >> >>
>>> >> >> I thought it needed to avoid matching something like
>>> >> >> a(is_power_of_two).. but it seems to work with that hunk reverted so
>>> >> >> I
>>> >> >> guess I can drop it..
>>> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> class Constant(Value):
>>> >> >> >> def __init__(self, val, name):
>>> >> >> >> @@ -150,7 +152,8 @@ class Constant(Value):
>>> >> >> >> return "nir_type_float"
>>> >> >> >>
>>> >> >> >> _var_name_re = re.compile(r"(?P<const>#)?(?P<name>\w+)"
>>> >> >> >> -
>>> >> >> >> r"(?:@(?P<type>int|uint|bool|float)?(?P<bits>\d+)?)?")
>>> >> >> >> +
>>> >> >> >> r"(?:@(?P<type>int|uint|bool|float)?(?P<bits>\d+)?)?"
>>> >> >> >> + r"(?P<cond>\([^\)]+\))?")
>>> >> >> >>
>>> >> >> >> class Variable(Value):
>>> >> >> >> def __init__(self, val, name, varset):
>>> >> >> >> @@ -161,6 +164,7 @@ class Variable(Value):
>>> >> >> >>
>>> >> >> >> self.var_name = m.group('name')
>>> >> >> >> self.is_constant = m.group('const') is not None
>>> >> >> >> + self.cond = m.group('cond')
>>> >> >> >> self.required_type = m.group('type')
>>> >> >> >> self.bit_size = int(m.group('bits')) if m.group('bits')
>>> >> >> >> else
>>> >> >> >> 0
>>> >> >> >>
>>> >> >> >> diff --git a/src/compiler/nir/nir_opt_algebraic.py
>>> >> >> >> b/src/compiler/nir/nir_opt_algebraic.py
>>> >> >> >> index 0a95725..952a91a 100644
>>> >> >> >> --- a/src/compiler/nir/nir_opt_algebraic.py
>>> >> >> >> +++ b/src/compiler/nir/nir_opt_algebraic.py
>>> >> >> >> @@ -62,6 +62,11 @@ d = 'd'
>>> >> >> >> # constructed value should have that bit-size.
>>> >> >> >>
>>> >> >> >> optimizations = [
>>> >> >> >> +
>>> >> >> >> + (('imul', a, '#b at 32(is_power_of_two)'), ('ishl', a,
>>> >> >> >> ('find_lsb',
>>> >> >> >> b))),
>>> >> >> >> + (('udiv', a, '#b at 32(is_power_of_two)'), ('ushr', a,
>>> >> >> >> ('find_lsb',
>>> >> >> >> b))),
>>> >> >> >> + (('umod', a, '#b(is_power_of_two)'), ('iand', a, ('isub',
>>> >> >> >> b,
>>> >> >> >> 1))),
>>> >> >> >> +
>>> >> >> >> (('fneg', ('fneg', a)), a),
>>> >> >> >> (('ineg', ('ineg', a)), a),
>>> >> >> >> (('fabs', ('fabs', a)), ('fabs', a)),
>>> >> >> >> diff --git a/src/compiler/nir/nir_search.c
>>> >> >> >> b/src/compiler/nir/nir_search.c
>>> >> >> >> index 2c2fd92..b21fb2c 100644
>>> >> >> >> --- a/src/compiler/nir/nir_search.c
>>> >> >> >> +++ b/src/compiler/nir/nir_search.c
>>> >> >> >> @@ -127,6 +127,9 @@ match_value(const nir_search_value *value,
>>> >> >> >> nir_alu_instr *instr, unsigned src,
>>> >> >> >> instr->src[src].src.ssa->parent_instr->type !=
>>> >> >> >> nir_instr_type_load_const)
>>> >> >> >> return false;
>>> >> >> >>
>>> >> >> >> + if (var->cond && !var->cond(instr, src, num_components,
>>> >> >> >> new_swizzle))
>>> >> >> >> + return false;
>>> >> >> >> +
>>> >> >> >> if (var->type != nir_type_invalid) {
>>> >> >> >> if (instr->src[src].src.ssa->parent_instr->type !=
>>> >> >> >> nir_instr_type_alu)
>>> >> >> >> return false;
>>> >> >> >> diff --git a/src/compiler/nir/nir_search.h
>>> >> >> >> b/src/compiler/nir/nir_search.h
>>> >> >> >> index a500feb..f55d797 100644
>>> >> >> >> --- a/src/compiler/nir/nir_search.h
>>> >> >> >> +++ b/src/compiler/nir/nir_search.h
>>> >> >> >> @@ -68,6 +68,16 @@ typedef struct {
>>> >> >> >> * never match anything.
>>> >> >> >> */
>>> >> >> >> nir_alu_type type;
>>> >> >> >> +
>>> >> >> >> + /** Optional condition fxn ptr
>>> >> >> >> + *
>>> >> >> >> + * This is only allowed in search expressions, and allows
>>> >> >> >> additional
>>> >> >> >> + * constraints to be placed on the match. Typically used for
>>> >> >> >> 'is_constant'
>>> >> >> >> + * variables to require, for example, power-of-two in order
>>> >> >> >> for
>>> >> >> >> the
>>> >> >> >> search
>>> >> >> >> + * to match.
>>> >> >> >> + */
>>> >> >> >> + bool (*cond)(nir_alu_instr *instr, unsigned src,
>>> >> >> >> + unsigned num_components, const uint8_t
>>> >> >> >> *swizzle);
>>> >> >> >> } nir_search_variable;
>>> >> >> >>
>>> >> >> >> typedef struct {
>>> >> >> >> diff --git a/src/compiler/nir/nir_search_helpers.h
>>> >> >> >> b/src/compiler/nir/nir_search_helpers.h
>>> >> >> >> new file mode 100644
>>> >> >> >> index 0000000..8ed2ca0
>>> >> >> >> --- /dev/null
>>> >> >> >> +++ b/src/compiler/nir/nir_search_helpers.h
>>> >> >> >> @@ -0,0 +1,66 @@
>>> >> >> >> +/*
>>> >> >> >> + * Copyright © 2016 Red Hat
>>> >> >> >> + *
>>> >> >> >> + * Permission is hereby granted, free of charge, to any person
>>> >> >> >> obtaining
>>> >> >> >> a
>>> >> >> >> + * copy of this software and associated documentation files (the
>>> >> >> >> "Software"),
>>> >> >> >> + * to deal in the Software without restriction, including
>>> >> >> >> without
>>> >> >> >> limitation
>>> >> >> >> + * the rights to use, copy, modify, merge, publish, distribute,
>>> >> >> >> sublicense,
>>> >> >> >> + * and/or sell copies of the Software, and to permit persons to
>>> >> >> >> whom
>>> >> >> >> the
>>> >> >> >> + * Software is furnished to do so, subject to the following
>>> >> >> >> conditions:
>>> >> >> >> + *
>>> >> >> >> + * The above copyright notice and this permission notice
>>> >> >> >> (including
>>> >> >> >> the
>>> >> >> >> next
>>> >> >> >> + * paragraph) shall be included in all copies or substantial
>>> >> >> >> portions
>>> >> >> >> of
>>> >> >> >> the
>>> >> >> >> + * Software.
>>> >> >> >> + *
>>> >> >> >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
>>> >> >> >> KIND,
>>> >> >> >> EXPRESS OR
>>> >> >> >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> >> >> >> MERCHANTABILITY,
>>> >> >> >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO
>>> >> >> >> EVENT
>>> >> >> >> SHALL
>>> >> >> >> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
>>> >> >> >> DAMAGES
>>> >> >> >> OR
>>> >> >> >> OTHER
>>> >> >> >> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>>> >> >> >> OTHERWISE,
>>> >> >> >> ARISING
>>> >> >> >> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>> >> >> >> OTHER
>>> >> >> >> DEALINGS
>>> >> >> >> + * IN THE SOFTWARE.
>>> >> >> >> + *
>>> >> >> >> + * Authors:
>>> >> >> >> + * Rob Clark <robclark at freedesktop.org>
>>> >> >> >> + */
>>> >> >> >> +
>>> >> >> >> +#ifndef _NIR_SEARCH_HELPERS_
>>> >> >> >> +#define _NIR_SEARCH_HELPERS_
>>> >> >> >> +
>>> >> >> >> +#include "nir.h"
>>> >> >> >> +
>>> >> >> >> +static inline bool
>>> >> >> >> +__is_power_of_two(unsigned int x)
>>> >> >> >> +{
>>> >> >> >> + return ((x != 0) && !(x & (x - 1)));
>>> >> >> >> +}
>>> >> >> >> +
>>> >> >> >> +static inline bool
>>> >> >> >> +is_power_of_two(nir_alu_instr *instr, unsigned src, unsigned
>>> >> >> >> num_components,
>>> >> >> >> + const uint8_t *swizzle)
>>> >> >> >> +{
>>> >> >> >> + nir_const_value *val =
>>> >> >> >> nir_src_as_const_value(instr->src[src].src);
>>> >> >> >> +
>>> >> >> >> + /* only constant src's: */
>>> >> >> >> + if (!val)
>>> >> >> >> + return false;
>>> >> >> >> +
>>> >> >> >> + for (unsigned i = 0; i < num_components; i++) {
>>> >> >> >> + switch (nir_op_infos[instr->op].input_types[src]) {
>>> >> >> >> + case nir_type_int:
>>> >> >> >> + if (!__is_power_of_two(val->i32[swizzle[i]]))
>>> >> >> >> + return false;
>>> >> >> >> + break;
>>> >> >> >> + case nir_type_uint:
>>> >> >> >> + if (!__is_power_of_two(val->u32[swizzle[i]]))
>>> >> >> >> + return false;
>>> >> >> >
>>> >> >> >
>>> >> >> > Your implementation of __is_power_of_two takes an unsigned.
>>> >> >> > There's
>>> >> >> > no
>>> >> >> > benefit to splitting it into two cases and it just creates a false
>>> >> >> > distinction. If you do, you can probably inline the helper (I
>>> >> >> > don't
>>> >> >> > care
>>> >> >> > much one way or the other on that).
>>> >> >>
>>> >> >> hmm, are you sure about negative PoT? Although possibly that should
>>> >> >> be '__is_power_of_two(ABS(val->i32[..])'?
>>> >> >
>>> >> >
>>> >> > I don't know off-hand whether or not it's correct for negative
>>> >> > power-of-two.
>>> >> > However, I do know that it does exactly the same thing in both cases
>>> >> > regardless of whether or not the first one is correct :-)
>>> >> > --Jason
>>> >>
>>> >> so, for the record, __is_power_of_two(ABS(val->i32[..])) seems to be
>>> >> the correct thing to do. That would let us add a (presumably
>>> >> optional) rule to lower idiv to ushr plus a multiply by the isign of
>>> >> each of the args..
>>> >
>>> > Unfortunately, idiv can never be lowered because -1 >> 1 == -1. Yeah,
>>> > that's annoying.
>>> >
>>> > Also, I don't think any of the optimizations you've added here works for
>>> > negative numbers without a sign-flip. We would need a pair of
>>> > optimizations: one for non-negative powers of two and one for negative.
>>>
>>> so pretty sure this isn't an issue, since (udiv, a,
>>> #b(is_power_of_two)) hits the nir_type_uint case in is_power_of_two()
>>> and doesn't recognize -2 (for example) as a PoT..
>>>
>>> possibly we could do a bit better, I think we'd have to split
>>> is_power_of_two() into is_neg_power_of_two() and
>>> is_pos_power_of_two()..
>>
>>
>> I like that plan
>>
>
> ok, for what it's worth, my current version of this patch has:
>
> (('imul', a, '#b at 32(is_pos_power_of_two)'), ('ishl', a, ('find_lsb', b))),
> (('imul', a, '#b at 32(is_neg_power_of_two)'), ('ineg', ('ishl', a,
> ('find_lsb', ('iabs', b))))),
> (('udiv', a, '#b at 32(is_pos_power_of_two)'), ('ushr', a, ('find_lsb', b))),
> (('idiv', a, '#b at 32(is_pos_power_of_two)'), ('imul', ('isign', a),
> ('ushr', ('iabs', a), ('find_lsb', b))), 'options->lower_idiv'),
> (('idiv', a, '#b at 32(is_neg_power_of_two)'), ('ineg', ('imul',
> ('isign', a), ('ushr', ('iabs', a), ('find_lsb', ('iabs', b))))),
> 'options->lower_idiv'),
> (('umod', a, '#b(is_pos_power_of_two)'), ('iand', a, ('isub', b, 1))),
>
> In theory this should be wrong, because (iabs MIN_INT) -> MIN_INT..
>
> but things are a bit strange.. I hacked up a .shader_test:
>
> ----
> [require]
> GLSL >= 1.30
>
> [vertex shader]
> void main()
> {
> gl_Position = gl_Vertex;
> }
>
> [fragment shader]
> uniform int arg0;
> uniform int expected;
>
> void main()
> {
> int result = (arg0 / 2);
> gl_FragColor = result == expected ? vec4(0.0, 1.0, 0.0, 1.0) :
> vec4(1.0, 0.0, 0.0, 1.0);
> }
>
> [test]
> clear color 0.0 0.0 1.0 0.0
> clear
> uniform int arg0 0x80000000
> uniform int expected 0xc0000000
> draw rect ortho 0 0 4 4
> probe rect rgba (0, 0, 4, 4) (0.0, 1.0, 0.0, 1.0)
> ----
>
> oddly (even w/out opt-algebraic rules), it fails for
> llvmpipe/softpipe/freedreno on armv7. But passes for
> llvmpipe/softpipe/i965 on x86. Not entirely sure what to make of
> that.
Ilia realized this was to do w/ strtol() vs 32b ints (so I guess it
would have passed on armv8..). I should have been using:
uniform int arg0 -2147483648
uniform int expected -1073741824
and now I get the same result on freedreno now, as i965, with that
lowering rule. (Ie. both passing.) So I guess it does work for the
MIN_INT case
BR,
-R
> Now the really odd thing is if I enable the opt rule (and I can see
> from INTEL_DEBUG=fs.. http://hastebin.com/meyekajake.md), that the
> test still passes. Not entirely sure how that is possible.
>
> I'll have to rig up some test program on arm to see what is really
> going on, but since it effects softpipe and llvmpipe I guess it is
> unrelated.
>
> BR,
> -R
More information about the mesa-dev
mailing list