[Beignet] [PATCH] libocl: refine implementation of normalize().
Song, Ruiling
ruiling.song at intel.com
Thu Jan 29 18:36:22 PST 2015
> -----Original Message-----
> From: Yang, Rong R
> Sent: Friday, January 30, 2015 10:23 AM
> To: Zhigang Gong; Song, Ruiling
> Cc: beignet at lists.freedesktop.org
> Subject: RE: [Beignet] [PATCH] libocl: refine implementation of normalize().
>
>
>
> > -----Original Message-----
> > From: Beignet [mailto:beignet-bounces at lists.freedesktop.org] On Behalf
> > Of Zhigang Gong
> > Sent: Thursday, January 29, 2015 16:25
> > To: Song, Ruiling
> > Cc: beignet at lists.freedesktop.org
> > Subject: Re: [Beignet] [PATCH] libocl: refine implementation of normalize().
> >
> > On Thu, Jan 29, 2015 at 03:40:27PM +0800, Ruiling Song wrote:
> > > Avoid if-branching.
> > >
> > > Signed-off-by: Ruiling Song <ruiling.song at intel.com>
> > > ---
> > > backend/src/libocl/src/ocl_geometric.cl | 19 ++++++-------------
> > > 1 file changed, 6 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/backend/src/libocl/src/ocl_geometric.cl
> > > b/backend/src/libocl/src/ocl_geometric.cl
> > > index 07f1419..baa2f85 100644
> > > --- a/backend/src/libocl/src/ocl_geometric.cl
> > > +++ b/backend/src/libocl/src/ocl_geometric.cl
> > > @@ -60,30 +60,23 @@ OVERLOADABLE float distance(float2 x, float2 y)
> > > { return length(x-y); } OVERLOADABLE float distance(float3 x,
> > > float3 y) { return length(x-y); } OVERLOADABLE float
> > > distance(float4 x, float4
> > > y) { return length(x-y); } OVERLOADABLE float normalize(float x) {
> > > - union { float f; unsigned u; } u;
> > > - u.f = x;
> > > - if(u.u == 0)
> > > - return 0.f;
> > > - if(isnan(x))
> > > - return NAN;
> > > - return u.u < 0x7fffffff ? 1.f : -1.f;
> > > + float m = length(x);
> > > + m = m == 0.0f ? 1.0f : m;
> > > + return x / m;
> > > }
> > > OVERLOADABLE float2 normalize(float2 x) {
> > > float m = length(x);
> > > - if(m == 0)
> > > - return 0;
> > > + m = m == 0.0f ? 1.0f : m;
> > > return x / m;
> > > }
> >
> > Although eliminate branching, but introduce one more div if the length
> > is zero. If a test case has many zero vectors, then this patch may
> > bring some performance regression, as division is relatively expensive.
> >
> > Any thoughts?
> >
> [Yang, Rong R] How about :
> float2 t = m == 0.0f ? x : x/m;
> return t;
LLVM will also introduce branching for this form. Note that the x/m can only be calculated if m is not zero.
This should be similar as if(m == 0) return 0;
>
> >
> > > OVERLOADABLE float3 normalize(float3 x) {
> > > float m = length(x);
> > > - if(m == 0)
> > > - return 0;
> > > + m = m == 0.0f ? 1.0f : m;
> > > return x / m;
> > > }
> > > OVERLOADABLE float4 normalize(float4 x) {
> > > float m = length(x);
> > > - if(m == 0)
> > > - return 0;
> > > + m = m == 0.0f ? 1.0f : m;
> > > return x / m;
> > > }
> > >
> > > --
> > > 1.7.10.4
> > >
> > > _______________________________________________
> > > Beignet mailing list
> > > Beignet at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/beignet
> > _______________________________________________
> > Beignet mailing list
> > Beignet at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/beignet
More information about the Beignet
mailing list