[Liboil] [patch] Optimized multsum_f64

Marcus Brubaker aurelius.marcus at rogers.com
Wed May 17 11:43:44 PDT 2006

David Schleef wrote:
> On Wed, May 17, 2006 at 02:06:53PM -0400, Marcus Brubaker wrote:
>> Interesting, it's on a Pentium M laptop so I guess that's not that 
>> surprising.  There may also be some inefficiencies in loading the data 
>> as well, but that can only be addressed in an unstrided context.
> Ah, right.  I thought that I had deprecated all the strided classes.
> There's really not much point in writing code for the current
> multsum_f64, since it pretty much can't go faster.

Well, it can go a bit faster, as the implementations that I submitted 
are about 40% faster than the reference implementation.

For my application striding is an important option in some cases.  I 
know that optimizations for those cases are hard but it would be nice if 
it was still handled in some way.

>> So if I wanted to add an unstrided version of multsum or a strided 
>> version of some other function what would be the preferred naming 
>> convention?
> multsum_f64_ns()

Alright, I may add a version of this in the future

> There are several simple operations (add, multiply, multsum, etc.)
> that need to be extended over all types, or at least the common types
> (f32, f64, and s16).

And a strided version of add? add_f64_st perhaps?

>> Also, what is the status of the vectoradd functions?  They're documented 
>> as being too hard to optimize and thus deprecated.  It seems that 
>> they're overly complicated and something without the s[34]_1 parameters 
>> would be easier to optimize and fairly useful (at least to me).  Are 
>> there plans to rectify this?  If not, I will be happy to do what I can 
>> given a bit of guidance on naming.
> The s3_1 and s4_1 are important because it's *vector* addition.  The
> strides are the problem.  You are probably looking for something like
> add_f64().

Ah, of course. 


More information about the Liboil mailing list