[Beignet] [PATCH] libocl: refine implementation of sign().

Thu Jan 29 19:44:18 PST 2015

Hi Matt,

Thanks for your comment! Your are right, the sign() in Mesa is really good.
I found it hard to written it in C code. Beignet also support implementation using Gen IR defined in Beignet,
which is almost directly mapped to Gen ASM. I will follow your suggestion. Thanks!

Ruiling
> -----Original Message-----
> From: Matt Turner [mailto:mattst88 at gmail.com]
> Sent: Friday, January 30, 2015 4:01 AM
> To: Song, Ruiling
> Cc: beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] libocl: refine implementation of sign().
> 
> On Wed, Jan 28, 2015 at 11:18 PM, Ruiling Song <ruiling.song at intel.com>
> wrote:
> > Avoid if-branching.
> >
> > Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> > ---
> >  backend/src/libocl/tmpl/ocl_common.tmpl.cl |   16 +++++++++-------
> >  1 file changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> > b/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> > index db7b0d8..77bd2d3 100644
> > --- a/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> > +++ b/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> > @@ -17,6 +17,7 @@
> >   */
> >  #include "ocl_common.h"
> >  #include "ocl_float.h"
> > +#include "ocl_relational.h"
> >
> >
> > //////////////////////////////////////////////////////////////////////
> > ///////
> >  // Common Functions
> > @@ -55,11 +56,12 @@ OVERLOADABLE float smoothstep(float e0, float
> e1,
> > float x) {  }
> >
> >  OVERLOADABLE float sign(float x) {
> > -  if(x > 0)
> > -    return 1;
> > -  if(x < 0)
> > -    return -1;
> > -  if(x == -0.f)
> > -    return -0.f;
> > -  return 0.f;
> > +  union {float f; unsigned u;} ieee;
> > +  ieee.f = x;
> > +  unsigned k = ieee.u;
> > +  float r = (k&0x80000000) ? -1.0f : 1.0f;  // differentiate +0.0f
> > + -0.0f  float s = 0.0f * r;  s = (x == 0.0f) ? s : r;  return
> > + isnan(x) ? 0.0f : s;
> >  }
> > --
> > 1.7.10.4
> 
> I don't know if the structure of Beignet allows it (I see that the
> implementation is in OpenCL C rather than hardware instructions), but Mesa
> implements sign() for GLSL in three instructions:
> 
> cmp.nz.f0  null    x:f  0.0:f
> and        ret:ud  x:ud 0x80000000:ud
> (+f0) or   ret:ud  ret:ud 0x3f800000:ud
> 
> The AND instruction extracts the sign bit, and the predicated OR instruction
> ORs in the hex value of 1.0 if x is not zero.
> 
> This gives +1.0 if x > 0.0
>            +0.0 if x == +0.0
>            -0.0 if x == -0.0
>            -1.0 if x < 0.0
> 
> And since the CMP.NZ's src1 is zero, you can move the conditional mod back
> into the instruction that generated x.
> 
> The CL spec says you also have to handle NaN, which this implementation
> doesn't do, but that should just be an additional two instructions, I think:
> 
> <CMP for NaN> (I don't remember precisely... CMPN.U maybe?)
> (+f0) mov  ret:f   0.0f
> 
> I think this should be a few instructions shorter than what your code will
> compile to.