[Liboil] [patch] Optimized multsum_f64
aurelius.marcus at rogers.com
Wed May 17 11:43:44 PDT 2006
David Schleef wrote:
> On Wed, May 17, 2006 at 02:06:53PM -0400, Marcus Brubaker wrote:
>> Interesting, it's on a Pentium M laptop so I guess that's not that
>> surprising. There may also be some inefficiencies in loading the data
>> as well, but that can only be addressed in an unstrided context.
> Ah, right. I thought that I had deprecated all the strided classes.
> There's really not much point in writing code for the current
> multsum_f64, since it pretty much can't go faster.
Well, it can go a bit faster, as the implementations that I submitted
are about 40% faster than the reference implementation.
For my application striding is an important option in some cases. I
know that optimizations for those cases are hard but it would be nice if
it was still handled in some way.
>> So if I wanted to add an unstrided version of multsum or a strided
>> version of some other function what would be the preferred naming
Alright, I may add a version of this in the future
> There are several simple operations (add, multiply, multsum, etc.)
> that need to be extended over all types, or at least the common types
> (f32, f64, and s16).
And a strided version of add? add_f64_st perhaps?
>> Also, what is the status of the vectoradd functions? They're documented
>> as being too hard to optimize and thus deprecated. It seems that
>> they're overly complicated and something without the s_1 parameters
>> would be easier to optimize and fairly useful (at least to me). Are
>> there plans to rectify this? If not, I will be happy to do what I can
>> given a bit of guidance on naming.
> The s3_1 and s4_1 are important because it's *vector* addition. The
> strides are the problem. You are probably looking for something like
Ah, of course.
More information about the Liboil