[Liboil] DCT and MDCT functions in liboil

Steven G. Johnson stevenj.mit at gmail.com
Thu Mar 27 15:15:39 PDT 2008

Hi, I'm one of the authors of FFTW (www.fftw.org), and I was naturally  
interested to see that you are planning to provide DCT and MDCT  
functions in liboil.

First, you might be interested to know that FFTW includes a code  
generator that can spit out highly-optimized C subroutines for DCTs,  
MDCTs, and IMDCTs of any fixed size (not just powers of 2), and can in  
most cases achieve the lowest known arithmetic counts (or nearly so)  
for a given transform type and size.  Although FFTW and its generator  
are themselves under the GPL, the generated code per se (being the  
output of program) is not copyrighted so you can use whatever  
copyright and license you like for the generator *output*.  (We would  
appreciate it if you still credit FFTW, of course.)  In the few cases  
I've tried, FFTW's generated code seem's to be significantly faster  
than the DCT code you have now.

By default, FFTW's generator outputs floating-point code, but it also  
has some support for outputting fixed-point code (technically, what it  
does is wrap macros like ADD, SUB, and MUL around all arithmetic  
operations in the generated code, so that you can replace these by the  
corresponding fixed-point operations if desired.... e.g. I seem to  
recall that Ogg Vorbis uses some similar macro stuff to implement  
fixed-point MDCTs).

Second, I took a look at your code and was a little confused; many of  
the transforms defined by your documented API seem to be missing, and  
some of them seem to be mislabeled.

For example, liboil/dct/imdct32_f32.c and liboil/lgpl/imdct32_f32.c  
are defined as implementing an "inverse modified cosine  
transform" (IMDCT), but they actually implement a type-II discrete  
cosine transform (DCT-II) of size 32, an entirely different  
transform.  You can see this by inspection of the liboil/dct/ 
imdct32_f32.c reference routine (comparing it to the transform  
definitions), and I've checked numerically that it is true for the  
other routine as well.

For the "imdct32" routine (really a DCT-II of size 32), I checked and  
our generator produces C code that runs about 30-50% faster than your  
lgpl/imdct32_f32.c routine (and both routines are hundreds of times  
faster than your reference code dct/imdct32_f32.c) on my Intel Core  
Duo machine with gcc.  See the attached file.

For your oil_fdct8_f64 routine (liboil/dct/fdct8_f64.c), our generated  
code (attached) is again about 30-40% faster than your  
"fdct8_f64_fast" function.  (This is even after I sped up your code by  
specializing it for stride-1.  It seems very odd to me that you  
apparently allow arbitrary strides in bytes --- double-precision  
numbers really need to be 8-byte aligned or you will totally kill  
performance; even if you want to support discontiguous data, it would  
make more sense to only allow strides in units of the underlying  
type).  I also noticed a more serious problem -- your "fdct8_f64_fast"  
routine is gratuitously inaccurate because its floating-point  
constants (C0_9808 etc.) are only entered to 9 decimal places but the  
routine is supposed to operate in double precision.

Several of your other routines, e.g. those in dct12_f32.c, use the  
O(N^2) algorithm and will certainly be many many times slower than our  
generated code, so I didn't bother to benchmark them.

Anyway, I hope this is helpful.  If you let me know

a) what transform types and sizes you need
b) with what normalization conventions (or any windowing)

I would be happy to send/post the generated code along with the  
corresponding command for our generator so that you can regenerate it  
yourself as needed.  You may also want to re-think your DCT API a bit  
for the reasons noted above.

Steven G. Johnson

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dct2-32.c
Type: application/octet-stream
Size: 9954 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/liboil/attachments/20080327/3559709b/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dct2-8.c
Type: application/octet-stream
Size: 2333 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/liboil/attachments/20080327/3559709b/attachment-0001.obj 

More information about the Liboil mailing list