[Beignet] [PATCH] libocl: refine implementation of normalize().
Zhigang Gong
zhigang.gong at linux.intel.com
Thu Jan 29 17:56:06 PST 2015
On Fri, Jan 30, 2015 at 02:29:51AM +0000, Song, Ruiling wrote:
> > > y) { return length(x-y); } OVERLOADABLE float normalize(float x) {
> > > - union { float f; unsigned u; } u;
> > > - u.f = x;
> > > - if(u.u == 0)
> > > - return 0.f;
> > > - if(isnan(x))
> > > - return NAN;
> > > - return u.u < 0x7fffffff ? 1.f : -1.f;
> > > + float m = length(x);
> > > + m = m == 0.0f ? 1.0f : m;
> > > + return x / m;
> > > }
> > > OVERLOADABLE float2 normalize(float2 x) {
> > > float m = length(x);
> > > - if(m == 0)
> > > - return 0;
> > > + m = m == 0.0f ? 1.0f : m;
> > > return x / m;
> > > }
> >
> > Although eliminate branching, but introduce one more div if the length is
> > zero. If a test case has many zero vectors, then this patch may bring some
> > performance regression, as division is relatively expensive.
> >
> > Any thoughts?
> I think what we should optimize for the most commonly used scenario. For real applications, most data is not zero I think.
> After my change, the asm will be like:
I agree that for non-zero length vectors, this patch looks great.
I'm just a little bit worry about the zero length cases. And want to discuss
whether there is an even better way?
I noticed Rong's comment:
"
float2 t = m == 0.0f ? x : x/m;
return t;
"
Actually this way doesn't solve the issue. As x/m is not a pre-existed value, it will
not eliminate the if conditional blocks and will generate the same instructions as
the "if (m==0) return 0" case.
So, I will accept this patch based on there is no better way.
Thanks for the patch.
> ( 27) cmp.e(8) g110<1>:F g112<8,8,1>:F 6.91076e-310F { align1 WE_normal 1Q };
> ( 29) cmp.e(8) g111<1>:F g113<8,8,1>:F 6.91076e-310F { align1 WE_normal 2Q };
> ( 31) (-f0) sel(16) g108<1>:F g112<8,8,1>:F 6.91076e-310F { align1 WE_normal 1H };
> ( 33) math fdiv(16) g106<1>:F g114<8,8,1>:F g108<8,8,1>:F { align1 WE_normal 1H };
> If it is written like
> If(m == 0) return 0;
> The generated asm will be like below: it will introduce some if/endif instructions which will hurt performance for non-zero data.
> Although the below asm seems need to be optimized. But it is hard to completely remove if/endif/comp.le instructions.
> That is why I choose to make the change. Any further comment?
>
> ( 25) cmp.e(8) g110<1>:F g112<8,8,1>:F 6.90196e-310F { align1 WE_normal 1Q };
> ( 27) cmp.e(8) g111<1>:F g113<8,8,1>:F 6.90196e-310F { align1 WE_normal 2Q };
> ( 29) (+f0) sel(16) g126<1>:UW g8.2<0,1,0>:UW g8<0,1,0>:UW { align1 WE_normal 1H };
> ( 31) mov(16) g108<1>:F g127.6<0,1,0>:F { align1 WE_normal 1H };
> ( 33) cmp.ne(16) null:UW g126<8,8,1>:UW 0x0UW { align1 WE_normal 1H switch };
> ( 35) (-f0) if(16) 4 { align1 WE_normal 1H };
> L1:
> ( 37) math fdiv(16) g108<1>:F g114<8,8,1>:F g112<8,8,1>:F { align1 WE_normal 1H };
> ( 39) endif(16) 2 null { align1 WE_all 1H };
> ( 41) endif(16) 2 null { align1 WE_normal 1H };
> L2:
> ( 43) cmp.le(16) null:UW g1<8,8,1>:UW 0x2UW { align1 WE_all 1H switch };
> ( 45) (+f0) if(16) 8 { align1 WE_normal 1H };
> >
> > > OVERLOADABLE float3 normalize(float3 x) {
> > > float m = length(x);
> > > - if(m == 0)
> > > - return 0;
> > > + m = m == 0.0f ? 1.0f : m;
> > > return x / m;
> > > }
> > > OVERLOADABLE float4 normalize(float4 x) {
> > > float m = length(x);
> > > - if(m == 0)
> > > - return 0;
> > > + m = m == 0.0f ? 1.0f : m;
> > > return x / m;
> > > }
> > >
> > > --
> > > 1.7.10.4
> > >
> > > _______________________________________________
> > > Beignet mailing list
> > > Beignet at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/beignet
More information about the Beignet
mailing list