[Liboil] [patch] Optimized multsum_f64
David Schleef
ds at schleef.org
Tue May 16 12:20:45 PDT 2006
On Mon, May 15, 2006 at 09:23:16PM -0400, Marcus Brubaker wrote:
> Here are two optimized versions of multsum_f64 and a patch for detecting
> SSE2 support. For some reason, the SSE2 version is slightly slower on
> my machine than the plain unrolled version. I'm not exactly an assembly
> wizard so I may be missing something obvious, suggestions welcome.
That happens sometimes. You may have one of those processors where
f64 ops are kinda slow. Some CPUs only have one or two FP multiply
units that get shared between SSE2 and the FPU, so it doesn't really
matter whether you use SSE2 or the FPU.
> This is the first time I've created a patch for a project in a long
> time, so please let me know if I've missed something. The patch was
> created using 'cvs diff -uNp' versus the latest anonymous CVS.
Sounds good to me. I usually suggest doing a 'cvs add' on the files
you are adding (which curiously, does not require CVS write access),
and then just use 'cvs diff -u'. Using '-N' may have put lots of
other files in the patch.
Please attach the patch to a bug report on bugs.freedesktop.org.
dave...
--
David Schleef
Big Kitten LLC (http://www.bigkitten.com/) -- data acquisition on Linux
More information about the Liboil
mailing list