[Liboil] DCT and MDCT functions in liboil
Steven G. Johnson
stevenj.mit at gmail.com
Thu Mar 27 15:15:39 PDT 2008
Hi, I'm one of the authors of FFTW (www.fftw.org), and I was naturally
interested to see that you are planning to provide DCT and MDCT
functions in liboil.
First, you might be interested to know that FFTW includes a code
generator that can spit out highly-optimized C subroutines for DCTs,
MDCTs, and IMDCTs of any fixed size (not just powers of 2), and can in
most cases achieve the lowest known arithmetic counts (or nearly so)
for a given transform type and size. Although FFTW and its generator
are themselves under the GPL, the generated code per se (being the
output of program) is not copyrighted so you can use whatever
copyright and license you like for the generator *output*. (We would
appreciate it if you still credit FFTW, of course.) In the few cases
I've tried, FFTW's generated code seem's to be significantly faster
than the DCT code you have now.
By default, FFTW's generator outputs floating-point code, but it also
has some support for outputting fixed-point code (technically, what it
does is wrap macros like ADD, SUB, and MUL around all arithmetic
operations in the generated code, so that you can replace these by the
corresponding fixed-point operations if desired.... e.g. I seem to
recall that Ogg Vorbis uses some similar macro stuff to implement
fixed-point MDCTs).
Second, I took a look at your code and was a little confused; many of
the transforms defined by your documented API seem to be missing, and
some of them seem to be mislabeled.
For example, liboil/dct/imdct32_f32.c and liboil/lgpl/imdct32_f32.c
are defined as implementing an "inverse modified cosine
transform" (IMDCT), but they actually implement a type-II discrete
cosine transform (DCT-II) of size 32, an entirely different
transform. You can see this by inspection of the liboil/dct/
imdct32_f32.c reference routine (comparing it to the transform
definitions), and I've checked numerically that it is true for the
other routine as well.
For the "imdct32" routine (really a DCT-II of size 32), I checked and
our generator produces C code that runs about 30-50% faster than your
lgpl/imdct32_f32.c routine (and both routines are hundreds of times
faster than your reference code dct/imdct32_f32.c) on my Intel Core
Duo machine with gcc. See the attached file.
For your oil_fdct8_f64 routine (liboil/dct/fdct8_f64.c), our generated
code (attached) is again about 30-40% faster than your
"fdct8_f64_fast" function. (This is even after I sped up your code by
specializing it for stride-1. It seems very odd to me that you
apparently allow arbitrary strides in bytes --- double-precision
numbers really need to be 8-byte aligned or you will totally kill
performance; even if you want to support discontiguous data, it would
make more sense to only allow strides in units of the underlying
type). I also noticed a more serious problem -- your "fdct8_f64_fast"
routine is gratuitously inaccurate because its floating-point
constants (C0_9808 etc.) are only entered to 9 decimal places but the
routine is supposed to operate in double precision.
Several of your other routines, e.g. those in dct12_f32.c, use the
O(N^2) algorithm and will certainly be many many times slower than our
generated code, so I didn't bother to benchmark them.
Anyway, I hope this is helpful. If you let me know
a) what transform types and sizes you need
b) with what normalization conventions (or any windowing)
I would be happy to send/post the generated code along with the
corresponding command for our generator so that you can regenerate it
yourself as needed. You may also want to re-think your DCT API a bit
for the reasons noted above.
Regards,
Steven G. Johnson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dct2-32.c
Type: application/octet-stream
Size: 9954 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/liboil/attachments/20080327/3559709b/attachment.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dct2-8.c
Type: application/octet-stream
Size: 2333 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/liboil/attachments/20080327/3559709b/attachment-0001.obj
More information about the Liboil
mailing list