[Beignet] [PATCH] libocl: refine implementation of normalize().
Song, Ruiling
ruiling.song at intel.com
Thu Jan 29 18:29:51 PST 2015
> > y) { return length(x-y); } OVERLOADABLE float normalize(float x) {
> > - union { float f; unsigned u; } u;
> > - u.f = x;
> > - if(u.u == 0)
> > - return 0.f;
> > - if(isnan(x))
> > - return NAN;
> > - return u.u < 0x7fffffff ? 1.f : -1.f;
> > + float m = length(x);
> > + m = m == 0.0f ? 1.0f : m;
> > + return x / m;
> > }
> > OVERLOADABLE float2 normalize(float2 x) {
> > float m = length(x);
> > - if(m == 0)
> > - return 0;
> > + m = m == 0.0f ? 1.0f : m;
> > return x / m;
> > }
>
> Although eliminate branching, but introduce one more div if the length is
> zero. If a test case has many zero vectors, then this patch may bring some
> performance regression, as division is relatively expensive.
>
> Any thoughts?
I think what we should optimize for the most commonly used scenario. For real applications, most data is not zero I think.
After my change, the asm will be like:
( 27) cmp.e(8) g110<1>:F g112<8,8,1>:F 6.91076e-310F { align1 WE_normal 1Q };
( 29) cmp.e(8) g111<1>:F g113<8,8,1>:F 6.91076e-310F { align1 WE_normal 2Q };
( 31) (-f0) sel(16) g108<1>:F g112<8,8,1>:F 6.91076e-310F { align1 WE_normal 1H };
( 33) math fdiv(16) g106<1>:F g114<8,8,1>:F g108<8,8,1>:F { align1 WE_normal 1H };
If it is written like
If(m == 0) return 0;
The generated asm will be like below: it will introduce some if/endif instructions which will hurt performance for non-zero data.
Although the below asm seems need to be optimized. But it is hard to completely remove if/endif/comp.le instructions.
That is why I choose to make the change. Any further comment?
( 25) cmp.e(8) g110<1>:F g112<8,8,1>:F 6.90196e-310F { align1 WE_normal 1Q };
( 27) cmp.e(8) g111<1>:F g113<8,8,1>:F 6.90196e-310F { align1 WE_normal 2Q };
( 29) (+f0) sel(16) g126<1>:UW g8.2<0,1,0>:UW g8<0,1,0>:UW { align1 WE_normal 1H };
( 31) mov(16) g108<1>:F g127.6<0,1,0>:F { align1 WE_normal 1H };
( 33) cmp.ne(16) null:UW g126<8,8,1>:UW 0x0UW { align1 WE_normal 1H switch };
( 35) (-f0) if(16) 4 { align1 WE_normal 1H };
L1:
( 37) math fdiv(16) g108<1>:F g114<8,8,1>:F g112<8,8,1>:F { align1 WE_normal 1H };
( 39) endif(16) 2 null { align1 WE_all 1H };
( 41) endif(16) 2 null { align1 WE_normal 1H };
L2:
( 43) cmp.le(16) null:UW g1<8,8,1>:UW 0x2UW { align1 WE_all 1H switch };
( 45) (+f0) if(16) 8 { align1 WE_normal 1H };
>
> > OVERLOADABLE float3 normalize(float3 x) {
> > float m = length(x);
> > - if(m == 0)
> > - return 0;
> > + m = m == 0.0f ? 1.0f : m;
> > return x / m;
> > }
> > OVERLOADABLE float4 normalize(float4 x) {
> > float m = length(x);
> > - if(m == 0)
> > - return 0;
> > + m = m == 0.0f ? 1.0f : m;
> > return x / m;
> > }
> >
> > --
> > 1.7.10.4
> >
> > _______________________________________________
> > Beignet mailing list
> > Beignet at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/beignet
More information about the Beignet
mailing list