[Beignet] [PATCH] libocl: refine implementation of sign().
Song, Ruiling
ruiling.song at intel.com
Thu Jan 29 19:44:18 PST 2015
Hi Matt,
Thanks for your comment! Your are right, the sign() in Mesa is really good.
I found it hard to written it in C code. Beignet also support implementation using Gen IR defined in Beignet,
which is almost directly mapped to Gen ASM. I will follow your suggestion. Thanks!
Ruiling
> -----Original Message-----
> From: Matt Turner [mailto:mattst88 at gmail.com]
> Sent: Friday, January 30, 2015 4:01 AM
> To: Song, Ruiling
> Cc: beignet at lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH] libocl: refine implementation of sign().
>
> On Wed, Jan 28, 2015 at 11:18 PM, Ruiling Song <ruiling.song at intel.com>
> wrote:
> > Avoid if-branching.
> >
> > Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> > ---
> > backend/src/libocl/tmpl/ocl_common.tmpl.cl | 16 +++++++++-------
> > 1 file changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> > b/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> > index db7b0d8..77bd2d3 100644
> > --- a/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> > +++ b/backend/src/libocl/tmpl/ocl_common.tmpl.cl
> > @@ -17,6 +17,7 @@
> > */
> > #include "ocl_common.h"
> > #include "ocl_float.h"
> > +#include "ocl_relational.h"
> >
> >
> > //////////////////////////////////////////////////////////////////////
> > ///////
> > // Common Functions
> > @@ -55,11 +56,12 @@ OVERLOADABLE float smoothstep(float e0, float
> e1,
> > float x) { }
> >
> > OVERLOADABLE float sign(float x) {
> > - if(x > 0)
> > - return 1;
> > - if(x < 0)
> > - return -1;
> > - if(x == -0.f)
> > - return -0.f;
> > - return 0.f;
> > + union {float f; unsigned u;} ieee;
> > + ieee.f = x;
> > + unsigned k = ieee.u;
> > + float r = (k&0x80000000) ? -1.0f : 1.0f; // differentiate +0.0f
> > + -0.0f float s = 0.0f * r; s = (x == 0.0f) ? s : r; return
> > + isnan(x) ? 0.0f : s;
> > }
> > --
> > 1.7.10.4
>
> I don't know if the structure of Beignet allows it (I see that the
> implementation is in OpenCL C rather than hardware instructions), but Mesa
> implements sign() for GLSL in three instructions:
>
> cmp.nz.f0 null x:f 0.0:f
> and ret:ud x:ud 0x80000000:ud
> (+f0) or ret:ud ret:ud 0x3f800000:ud
>
> The AND instruction extracts the sign bit, and the predicated OR instruction
> ORs in the hex value of 1.0 if x is not zero.
>
> This gives +1.0 if x > 0.0
> +0.0 if x == +0.0
> -0.0 if x == -0.0
> -1.0 if x < 0.0
>
> And since the CMP.NZ's src1 is zero, you can move the conditional mod back
> into the instruction that generated x.
>
> The CL spec says you also have to handle NaN, which this implementation
> doesn't do, but that should just be an additional two instructions, I think:
>
> <CMP for NaN> (I don't remember precisely... CMPN.U maybe?)
> (+f0) mov ret:f 0.0f
>
> I think this should be a few instructions shorter than what your code will
> compile to.
More information about the Beignet
mailing list