[Beignet] [PATCH] libocl: refine implementation of normalize().

Thu Jan 29 18:29:51 PST 2015

> > y) { return length(x-y); }  OVERLOADABLE float normalize(float x) {
> > -  union { float f; unsigned u; } u;
> > -  u.f = x;
> > -  if(u.u == 0)
> > -    return 0.f;
> > -  if(isnan(x))
> > -    return NAN;
> > -  return u.u < 0x7fffffff ? 1.f : -1.f;
> > +  float m = length(x);
> > +  m = m == 0.0f ? 1.0f : m;
> > +  return x / m;
> >  }
> >  OVERLOADABLE float2 normalize(float2 x) {
> >    float m = length(x);
> > -  if(m == 0)
> > -    return 0;
> > +  m = m == 0.0f ? 1.0f : m;
> >    return x / m;
> >  }
> 
> Although eliminate branching, but introduce one more div if the length is
> zero. If a test case has many zero vectors, then this patch may bring some
> performance regression, as division is relatively expensive.
> 
> Any thoughts?
I think what we should optimize for the most commonly used scenario. For real applications, most data is not zero I think.
After my change, the asm will be like:
    (      27)  cmp.e(8)        g110<1>:F       g112<8,8,1>:F   6.91076e-310F   { align1 WE_normal 1Q };
    (      29)  cmp.e(8)        g111<1>:F       g113<8,8,1>:F   6.91076e-310F   { align1 WE_normal 2Q };
    (      31)  (-f0) sel(16)   g108<1>:F       g112<8,8,1>:F   6.91076e-310F   { align1 WE_normal 1H };
    (      33)  math fdiv(16)   g106<1>:F       g114<8,8,1>:F   g108<8,8,1>:F   { align1 WE_normal 1H };
If it is written like
If(m == 0) return 0;
The generated asm will be like below: it will introduce some if/endif instructions which will hurt performance for non-zero data.
Although the below asm seems need to be optimized. But it is hard to completely remove if/endif/comp.le instructions.
That is why I choose to make the change. Any further comment?

    (      25)  cmp.e(8)        g110<1>:F       g112<8,8,1>:F   6.90196e-310F   { align1 WE_normal 1Q };
    (      27)  cmp.e(8)        g111<1>:F       g113<8,8,1>:F   6.90196e-310F   { align1 WE_normal 2Q };
    (      29)  (+f0) sel(16)   g126<1>:UW      g8.2<0,1,0>:UW  g8<0,1,0>:UW    { align1 WE_normal 1H };
    (      31)  mov(16)         g108<1>:F       g127.6<0,1,0>:F                 { align1 WE_normal 1H };
    (      33)  cmp.ne(16)      null:UW         g126<8,8,1>:UW  0x0UW           { align1 WE_normal 1H switch };
    (      35)  (-f0) if(16) 4                                                  { align1 WE_normal 1H };
  L1:
    (      37)  math fdiv(16)   g108<1>:F       g114<8,8,1>:F   g112<8,8,1>:F   { align1 WE_normal 1H };
    (      39)  endif(16) 2                     null                            { align1 WE_all 1H };
    (      41)  endif(16) 2                     null                            { align1 WE_normal 1H };
  L2:
    (      43)  cmp.le(16)      null:UW         g1<8,8,1>:UW    0x2UW           { align1 WE_all 1H switch };
    (      45)  (+f0) if(16) 8                                                  { align1 WE_normal 1H };
> 
> >  OVERLOADABLE float3 normalize(float3 x) {
> >    float m = length(x);
> > -  if(m == 0)
> > -    return 0;
> > +  m = m == 0.0f ? 1.0f : m;
> >    return x / m;
> >  }
> >  OVERLOADABLE float4 normalize(float4 x) {
> >    float m = length(x);
> > -  if(m == 0)
> > -    return 0;
> > +  m = m == 0.0f ? 1.0f : m;
> >    return x / m;
> >  }
> >
> > --
> > 1.7.10.4
> >
> > _______________________________________________
> > Beignet mailing list
> > Beignet at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/beignet