[Mesa-dev] [PATCH] nir: add option to lower slt/sge/seq/sne

Tue Mar 31 11:03:30 PDT 2015

On Tuesday, March 31, 2015 11:30:17 AM Rob Clark wrote:
> From: Rob Clark <robclark at freedesktop.org>
> 
> In freedreno these get implemented as the matching f* instruction plus a
> u2f to convert the result to float 1.0/0.0.  But less lines of code to
> just let nir_opt_algebraic handle this for us, plus opens up some small
> window for other opt passes to improve (ie. if some shader ended up with
> both a flt and slt with same src args, for example).
> 
> Signed-off-by: Rob Clark <robclark at freedesktop.org>
> ---
>  src/glsl/nir/nir.h                | 3 +++
>  src/glsl/nir/nir_opt_algebraic.py | 5 +++++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
> index 669a26e..11505f9 100644
> --- a/src/glsl/nir/nir.h
> +++ b/src/glsl/nir/nir.h
> @@ -1371,6 +1371,9 @@ typedef struct nir_shader_compiler_options {
>     /** lowers fneg and ineg to fsub and isub. */
>     bool lower_negate;
>  
> +   /* lower {slt,sge,seq,sne} to {flt,fge,feq,fne} + u2f: */
> +   bool lower_scmp;
> +
>     /**
>      * Does the driver support real 32-bit integers?  (Otherwise, integers
>      * are simulated by floats.)
> diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py
> index ef855aa..6bd4187 100644
> --- a/src/glsl/nir/nir_opt_algebraic.py
> +++ b/src/glsl/nir/nir_opt_algebraic.py
> @@ -95,6 +95,11 @@ optimizations = [
>     (('fsat', a), ('fmin', ('fmax', a, 0.0), 1.0), 'options->lower_fsat'),
>     (('fsat', ('fsat', a)), ('fsat', a)),
>     (('fmin', ('fmax', ('fmin', ('fmax', a, 0.0), 1.0), 0.0), 1.0), ('fmin', ('fmax', a, 0.0), 1.0)),
> +   (('slt', a, b), ('u2f', ('flt', a, b)), 'options->lower_scmp'),
> +   (('sge', a, b), ('u2f', ('fge', a, b)), 'options->lower_scmp'),
> +   (('seq', a, b), ('u2f', ('feq', a, b)), 'options->lower_scmp'),
> +   (('sne', a, b), ('u2f', ('fne', a, b)), 'options->lower_scmp'),
> +
>     # Comparison with the same args.  Note that these are not done for
>     # the float versions because NaN always returns false on float
>     # inequalities.
> 

Hi Rob!

I'm pretty sure you want b2f here, not u2f...the slt/sge/seq/sne opcodes
are defined to return either 0.0 or 1.0.  flt and friends return 0 or
0xFFFFFFFF.  u2f converts the numerical value of the unsigned source to
float, so this would return 0.0 or 4294967295.0.

b2f on i965 is implemented as "AND src 0x3f8" which would give you 0x0
or 0x3f8 = 1.0.  It sounds like vc4 does the same trick.

With s/u2f/b2f/g, this patch would be:
Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>

Thanks for doing this!  I'll want to use these patterns too.

--Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20150331/5d45d66e/attachment.sig>