[Liboil] [patch] Optimized multsum_f64
aurelius.marcus at rogers.com
Wed May 17 11:06:53 PDT 2006
David Schleef wrote:
> On Mon, May 15, 2006 at 09:23:16PM -0400, Marcus Brubaker wrote:
>> Here are two optimized versions of multsum_f64 and a patch for detecting
>> SSE2 support. For some reason, the SSE2 version is slightly slower on
>> my machine than the plain unrolled version. I'm not exactly an assembly
>> wizard so I may be missing something obvious, suggestions welcome.
> That happens sometimes. You may have one of those processors where
> f64 ops are kinda slow. Some CPUs only have one or two FP multiply
> units that get shared between SSE2 and the FPU, so it doesn't really
> matter whether you use SSE2 or the FPU.
Interesting, it's on a Pentium M laptop so I guess that's not that
surprising. There may also be some inefficiencies in loading the data
as well, but that can only be addressed in an unstrided context.
So if I wanted to add an unstrided version of multsum or a strided
version of some other function what would be the preferred naming
Also, what is the status of the vectoradd functions? They're documented
as being too hard to optimize and thus deprecated. It seems that
they're overly complicated and something without the s_1 parameters
would be easier to optimize and fairly useful (at least to me). Are
there plans to rectify this? If not, I will be happy to do what I can
given a bit of guidance on naming.
>> This is the first time I've created a patch for a project in a long
>> time, so please let me know if I've missed something. The patch was
>> created using 'cvs diff -uNp' versus the latest anonymous CVS.
> Sounds good to me. I usually suggest doing a 'cvs add' on the files
> you are adding (which curiously, does not require CVS write access),
> and then just use 'cvs diff -u'. Using '-N' may have put lots of
> other files in the patch.
> Please attach the patch to a bug report on bugs.freedesktop.org.
More information about the Liboil