[Beignet] [PATCH] libocl: refine implementation of sign().

Thu Jan 29 12:00:54 PST 2015

On Wed, Jan 28, 2015 at 11:18 PM, Ruiling Song <ruiling.song at intel.com> wrote:
> Avoid if-branching.
>
> Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> ---
>  backend/src/libocl/tmpl/ocl_common.tmpl.cl |   16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/backend/src/libocl/tmpl/ocl_common.tmpl.cl b/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> index db7b0d8..77bd2d3 100644
> --- a/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> +++ b/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> @@ -17,6 +17,7 @@
>   */
>  #include "ocl_common.h"
>  #include "ocl_float.h"
> +#include "ocl_relational.h"
>
>  /////////////////////////////////////////////////////////////////////////////
>  // Common Functions
> @@ -55,11 +56,12 @@ OVERLOADABLE float smoothstep(float e0, float e1, float x) {
>  }
>
>  OVERLOADABLE float sign(float x) {
> -  if(x > 0)
> -    return 1;
> -  if(x < 0)
> -    return -1;
> -  if(x == -0.f)
> -    return -0.f;
> -  return 0.f;
> +  union {float f; unsigned u;} ieee;
> +  ieee.f = x;
> +  unsigned k = ieee.u;
> +  float r = (k&0x80000000) ? -1.0f : 1.0f;
> +  // differentiate +0.0f -0.0f
> +  float s = 0.0f * r;
> +  s = (x == 0.0f) ? s : r;
> +  return isnan(x) ? 0.0f : s;
>  }
> --
> 1.7.10.4

I don't know if the structure of Beignet allows it (I see that the
implementation is in OpenCL C rather than hardware instructions), but
Mesa implements sign() for GLSL in three instructions:

cmp.nz.f0  null    x:f  0.0:f
and        ret:ud  x:ud 0x80000000:ud
(+f0) or   ret:ud  ret:ud 0x3f800000:ud

The AND instruction extracts the sign bit, and the predicated OR
instruction ORs in the hex value of 1.0 if x is not zero.

This gives +1.0 if x > 0.0
           +0.0 if x == +0.0
           -0.0 if x == -0.0
           -1.0 if x < 0.0

And since the CMP.NZ's src1 is zero, you can move the conditional mod
back into the instruction that generated x.

The CL spec says you also have to handle NaN, which this
implementation doesn't do, but that should just be an additional two
instructions, I think:

<CMP for NaN> (I don't remember precisely... CMPN.U maybe?)
(+f0) mov  ret:f   0.0f

I think this should be a few instructions shorter than what your code
will compile to.