[Mesa-dev] [PATCH] nir: add option to lower slt/sge/seq/sne

Tue Mar 31 11:11:49 PDT 2015

On Tue, Mar 31, 2015 at 2:03 PM, Kenneth Graunke <kenneth at whitecape.org> wrote:
> On Tuesday, March 31, 2015 11:30:17 AM Rob Clark wrote:
>> From: Rob Clark <robclark at freedesktop.org>
>>
>> In freedreno these get implemented as the matching f* instruction plus a
>> u2f to convert the result to float 1.0/0.0.  But less lines of code to
>> just let nir_opt_algebraic handle this for us, plus opens up some small
>> window for other opt passes to improve (ie. if some shader ended up with
>> both a flt and slt with same src args, for example).
>>
>> Signed-off-by: Rob Clark <robclark at freedesktop.org>
>> ---
>>  src/glsl/nir/nir.h                | 3 +++
>>  src/glsl/nir/nir_opt_algebraic.py | 5 +++++
>>  2 files changed, 8 insertions(+)
>>
>> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
>> index 669a26e..11505f9 100644
>> --- a/src/glsl/nir/nir.h
>> +++ b/src/glsl/nir/nir.h
>> @@ -1371,6 +1371,9 @@ typedef struct nir_shader_compiler_options {
>>     /** lowers fneg and ineg to fsub and isub. */
>>     bool lower_negate;
>>
>> +   /* lower {slt,sge,seq,sne} to {flt,fge,feq,fne} + u2f: */
>> +   bool lower_scmp;
>> +
>>     /**
>>      * Does the driver support real 32-bit integers?  (Otherwise, integers
>>      * are simulated by floats.)
>> diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py
>> index ef855aa..6bd4187 100644
>> --- a/src/glsl/nir/nir_opt_algebraic.py
>> +++ b/src/glsl/nir/nir_opt_algebraic.py
>> @@ -95,6 +95,11 @@ optimizations = [
>>     (('fsat', a), ('fmin', ('fmax', a, 0.0), 1.0), 'options->lower_fsat'),
>>     (('fsat', ('fsat', a)), ('fsat', a)),
>>     (('fmin', ('fmax', ('fmin', ('fmax', a, 0.0), 1.0), 0.0), 1.0), ('fmin', ('fmax', a, 0.0), 1.0)),
>> +   (('slt', a, b), ('u2f', ('flt', a, b)), 'options->lower_scmp'),
>> +   (('sge', a, b), ('u2f', ('fge', a, b)), 'options->lower_scmp'),
>> +   (('seq', a, b), ('u2f', ('feq', a, b)), 'options->lower_scmp'),
>> +   (('sne', a, b), ('u2f', ('fne', a, b)), 'options->lower_scmp'),
>> +
>>     # Comparison with the same args.  Note that these are not done for
>>     # the float versions because NaN always returns false on float
>>     # inequalities.
>>
>
> Hi Rob!
>
> I'm pretty sure you want b2f here, not u2f...the slt/sge/seq/sne opcodes
> are defined to return either 0.0 or 1.0.  flt and friends return 0 or
> 0xFFFFFFFF.  u2f converts the numerical value of the unsigned source to
> float, so this would return 0.0 or 4294967295.0.
>

hmm, that is a bit sad (since on the flt/etc cases I'd have to
multiply by 0xffffffff, which would in turn require a mov for the
0xffffffff or perhaps emitting a driver uniform/const), and since it
makes the b2f more complicated..

I guess I can just implement b2f to be same as u2f in my backend and
hope for the best..

can a bool be reinterpreted as an int and (for example) multiplied by
things?  If so, can we maybe have reinterpret opcodes so I can fix
things up?

BR,
-R

> b2f on i965 is implemented as "AND src 0x3f8" which would give you 0x0
> or 0x3f8 = 1.0.  It sounds like vc4 does the same trick.
>
> With s/u2f/b2f/g, this patch would be:
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>
> Thanks for doing this!  I'll want to use these patterns too.
>
> --Ken