[Mesa-dev] [PATCH] nir: add option to lower slt/sge/seq/sne

Tue Mar 31 11:18:57 PDT 2015

On Tue, Mar 31, 2015 at 2:11 PM, Rob Clark <robdclark at gmail.com> wrote:
> On Tue, Mar 31, 2015 at 2:03 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
>> On Tuesday, March 31, 2015 11:30:17 AM Rob Clark wrote:
>>> From: Rob Clark <robclark at freedesktop.org>
>>>
>>> In freedreno these get implemented as the matching f* instruction plus a
>>> u2f to convert the result to float 1.0/0.0.  But less lines of code to
>>> just let nir_opt_algebraic handle this for us, plus opens up some small
>>> window for other opt passes to improve (ie. if some shader ended up with
>>> both a flt and slt with same src args, for example).
>>>
>>> Signed-off-by: Rob Clark <robclark at freedesktop.org>
>>> ---
>>>  src/glsl/nir/nir.h                | 3 +++
>>>  src/glsl/nir/nir_opt_algebraic.py | 5 +++++
>>>  2 files changed, 8 insertions(+)
>>>
>>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>>> index 669a26e..11505f9 100644
>>> --- a/src/glsl/nir/nir.h
>>> +++ b/src/glsl/nir/nir.h
>>> @@ -1371,6 +1371,9 @@ typedef struct nir_shader_compiler_options {
>>>     /** lowers fneg and ineg to fsub and isub. */
>>>     bool lower_negate;
>>>
>>> +   /* lower {slt,sge,seq,sne} to {flt,fge,feq,fne} + u2f: */
>>> +   bool lower_scmp;
>>> +
>>>     /**
>>>      * Does the driver support real 32-bit integers?  (Otherwise, integers
>>>      * are simulated by floats.)
>>> diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py
>>> index ef855aa..6bd4187 100644
>>> --- a/src/glsl/nir/nir_opt_algebraic.py
>>> +++ b/src/glsl/nir/nir_opt_algebraic.py
>>> @@ -95,6 +95,11 @@ optimizations = [
>>>     (('fsat', a), ('fmin', ('fmax', a, 0.0), 1.0), 'options->lower_fsat'),
>>>     (('fsat', ('fsat', a)), ('fsat', a)),
>>>     (('fmin', ('fmax', ('fmin', ('fmax', a, 0.0), 1.0), 0.0), 1.0), ('fmin', ('fmax', a, 0.0), 1.0)),
>>> +   (('slt', a, b), ('u2f', ('flt', a, b)), 'options->lower_scmp'),
>>> +   (('sge', a, b), ('u2f', ('fge', a, b)), 'options->lower_scmp'),
>>> +   (('seq', a, b), ('u2f', ('feq', a, b)), 'options->lower_scmp'),
>>> +   (('sne', a, b), ('u2f', ('fne', a, b)), 'options->lower_scmp'),
>>> +
>>>     # Comparison with the same args.  Note that these are not done for
>>>     # the float versions because NaN always returns false on float
>>>     # inequalities.
>>>
>>
>> Hi Rob!
>>
>> I'm pretty sure you want b2f here, not u2f...the slt/sge/seq/sne opcodes
>> are defined to return either 0.0 or 1.0.  flt and friends return 0 or
>> 0xFFFFFFFF.  u2f converts the numerical value of the unsigned source to
>> float, so this would return 0.0 or 4294967295.0.
>>
>
> hmm, that is a bit sad (since on the flt/etc cases I'd have to
> multiply by 0xffffffff, which would in turn require a mov for the
> 0xffffffff or perhaps emitting a driver uniform/const), and since it
> makes the b2f more complicated..

Without commenting on nir, the way that freedreno currently implements
this is by doing a neg on the cmp output, since TGSI also mandates the
0/~0 bools. And then b2f becomes bool & 0x3f8. (Which kinda sucks for
later optimizations since they have to explicitly know about this
pattern, or alternatively, not be able to see through it.)

It would be nice if there were support for different acceptable values
of "true" for bools, but this presents (other) issues for
optimizations as well.

  -ilia