[Liboil] oil_multiply_f32 and friends
ds at schleef.org
Wed Nov 16 12:59:06 PST 2005
On Wed, Nov 16, 2005 at 08:40:19PM +0000, Adam D. Moss wrote:
> David Schleef wrote:
> >On Wed, Nov 16, 2005 at 08:24:57PM +0000, Adam D. Moss wrote:
> >>Thinking that simple math vectorisation functions like
> >>oil_multiply_f32 would be liboil's bread and butter, I
> >>was surprised to find that there are only two implementations
> >>of this function including the ref, both in vanilla C.
> >I write code, and people complain about the lack of documentation.
> >I write documentation, and people complain about the lack of
> >SSE optimization. :)
> >Reason #1: Nobody has written it yet.
> No, that's fine, I just wondered if these were potentially
> hackable classes or whether they were found to have hard /
> unworkable / unoptimizable interfaces like some liboil
> classes are documented to have.
Ah, good point. I should probably add a page to the wiki
describing some good places to start if you feel like writing
First, read the fine documentation. (Did I mention the 96%
symbol coverage?) If it says "this function is broken and
should be replaced", don't bother writing implementations for it.
If you feel like writing C code, find any class that only has
a reference implementation. Try a couple techniques to make
it faster and submit all of them. Chances are likely that some
technique will work well on some random CPU. Good C
implementations are a cornerstone of liboil, since it means
that liboil will not hopelessly suck on non-i386 platforms.
If you feel like writing vector code (either intrinsics or
inline assembly), look for a class that doesn't have strides
_and_ one element per row. Those are classes with d_1xn, s_1xn,
or the equivalents dest, d, d_n, src, etc., and also have
strides for those parameters. These are hard to vectorize,
so future versions of liboil will drop most of them. Otherwise,
go wild. You can't go wrong writing dirt simple SSE code for
the math functions, since it will almost certainly be several
times faster than the C code.
And if you think that a particular implementation that you write
could be made faster, but that you don't feel like doing it
right away, write a comment. Eventually, I'll turn those into
The example 'examples/report' may be of use.
Big Kitten LLC (http://www.bigkitten.com/) -- data acquisition on Linux
More information about the Liboil