[pulseaudio-discuss] [PATCH] Remove module-equalizer-sink

Mon Mar 10 12:50:24 PDT 2014

10.03.2014 20:14, Jason Newton wrote:
> Author here.

Hello, and thanks for showing up.

>
> Wow, I didn't realize there was so many problems with it (somewhat 
> sarcastic here).  But I have been using it every day for hours since 
> before I submitted it and I've stayed up to date with pulseaudio (as a 
> user) with each successive version as it's been realized on openSUSE.  
> The only real issue I've had is that tsched and flash and the 
> equalizer don't really work together and video games don't like the 
> latency - I never noticed a delay with mpv/mplayer and videos or 
> anything iritating with music on mpd.

The delay is indeed not noticeable if you have nothing to compare the 
sound timing with. I.e. not a problem with music, but a problem with videos.

>   For those 2 trouble cases of flash/games I just move them to the 
> hardware sink.  Yea rewinding/seeks of audio cause a small audio 
> glitch but its not something that bothers me much as it automatically 
> recovers within a second.

PulseAudio must be perfect :) and I think that this glitch is, in 
theory, fixable. After all, the past samples are stored in input_q, your 
module is notified about rewinds, and it can reprocess the past samples 
when needed.

>   It's worked pretty well for me.  I've seen a few gentoo people using 
> it but I know it never really reached mass usage due to the 
> (unfortunate) timing and naming of pulseaudio equalizer (the gtk 
> ladspa script with non-real-time tuning). I've liked the large 
> spectrum of the filter to also have make shift notch filters for when 
> I'm to lazy to actually load things up in matlab or audacity.

For notches, IIR filters would be far more appropriate than anything 
FFT-based, due to much lower algorithmic latency and, in some cases, 
lower CPU usage for the similar frequency resolution. As for the 
unfortunate timing and the confusing naming of the gtk-based script - I 
agree.

>
> I'll go through this a little bit to answer what I can at the moment 
> (I just sort of stumbled on to this thread, I'm neck deep in other 
> stuff these days).
>
> On Fri, Mar 7, 2014 at 11:43 AM, Alexander E. Patrakov 
> <patrakov at gmail.com <mailto:patrakov at gmail.com>> wrote:
>
>     1. The FFT size and the window size for overlap-add are chosen
>     inconsistently. The window size is always 16000 samples. The FFT size
>     depends on the sample rate. Thus, with sample rates of 16 kHz or
>     below,
>     there is a buffer overflow (window becomes larger than FFT).
>
>
> This was done to allow more detail to be preserved from the signal, or 
> at least that was the thought.  FFT size would be the 
> 2^(ceil(log2(sample_rate)) so the spectrum covered all theoretical 
> frequencies the audio signal at that sample rate could contain.   
> Granularity of latency would be something that isn't too bad for most 
> people and still efficient as those FFTs are still expensive. Perhaps 
> this should have been based on the sample rate as well, but the 
> latency wasn't bad for me and cpu usage is overall 1-2% at 44.1khz on 
> an i7.  I don't recall it being bad (2-5%?) at 96khz back on an i7 
> 920.  I didn't think one would really would have a sample rate lower 
> than 16khz .

As for "so the spectrum covered all theoretical frequencies the audio 
signal at that sample rate could contain". Any size of the FFT can 
represent all frequencies between 0 Hz and half the sample rate, the 
only real difference is about granularity of directly representable 
frequencies. I.e., with short FFT sizes, "weird" frequencies (those that 
are not a multiple of the sample rate divided by the FFT size) are 
represented not directly, but by variation of the transform results in 
successive periods. I guess suitable granularity is what you mean here 
(after all, your choice of the FFT size can't represent half-integer 
frequencies directly, and they do exist), and that's what I mean with my 
point 3.

Even though one is unlikely to have a sample rate lower than 16 kHz, it 
still happens sometimes in real life (e.g. with BlueTooth), and a buffer 
overflow is always a bug. So, to avoid inconsistent choices, you should 
choose both the window size and the FFT size as a function of the 
sampling rate.

>
>
>     2. There is no attempt to ensure that the impulse response of the
>     equalizer is short enough to fit in the difference between the FFT
>     size
>     and the window size (a requirement for the correct operation of the
>     overlap-add technique). See [1] for the FFT size consideration and [2]
>     why it is important to regularize the desired frequency response.
>
>
> Yes, I recall someone mentioned that theoretically there is "ringing" 
> that occurs due to this... it's been below the noise floor for me if 
> it happens.  Has it been a problem for anyone? By the way, maybe I'm 
> remembering wrong but I believe that tails from the "ringing" cancel 
> out with COLA method when the filter is constant, that's why I used it 
> and allowed an arbitrary magnitude based filter (no phase adjustments 
> though).  I'm not an audio engineer though, I mainly due spatial 
> signal processing on images which is a whole different ball-game.

It is below the noise floor only because you have not tried a non-smooth 
frequency response (the one that you cannot draw in qpaeq without 
stretching its X11 window well beyond the screen size), and because of 
the excessive size of the window that allows for long tails of the 
impulse response. Still, as long as you don't make it absolutely sure 
that the impulse response of the filter is shorter than the FFT size 
minus the window size, you are not implementing a convolution. COLA 
doesn't help you here, it is indeed useful only for reconstruction (i.e. 
for audio codecs, i.e. for unity or constant gain), not for manipulation 
(i.e. filters, including the equalizer).

I also believe that you are confused by two similarly-named, often 
discussed together, but actually different and unrelated things: (1) 
"overlap-add decomposition", which is related to short-time Fourier 
transform and which only works if the window satisfies the COLA 
condition, and (2) "overlap-add method", which is a means to precisely 
implement a convolution of a long signal with a short one, i.e. a useful 
trick against circular convolution.

Let me explain again what happens here.

First, let's note that we need to create a filter that is linear and 
time-invariant. This means it is a convolution. So we need a convolution 
of the input signal with something related to the equalizer settings. 
That something has a well-defined length (not counting any trailing 
zeros), but, if you don't actively enforce it by windowing, you will 
generally end up with it occupying the whole FFT size (N). Let that 
well-defined length be L.

Then, let's note that convolution with such a signal will make a given 
input sample affect only the corresponding output sample and L-1 more 
output samples in the future.

With the overlap-add method, one should cover the signal with a sequence 
of windows, such that COLA is satisfied (the best choice here would be 
rectangular windows, see below). Then pad each windowed piece of signal 
with zeros. Then take the FFT, multiply the result by the FFT of the 
"something", take an IFFT of the result. You will end up with 
overlapping pieces of the output signal that you just need to add.

Now let's consider the "take the FFT, multiply the result by the FFT of 
the something, take an IFFT of the result" step in more detail. What is 
prescribed here is a circular convolution. It is almost the same as the 
regular convolution, except that it, instead of producing some samples 
to the right of the original signal, will wrap them around and push to 
the left, adding to the existing samples (and thus corrupting them).

Padding is a way to avoid corruption here. To avoid corruption (i.e. to 
make "normal" and circular convolution yield the same result), you need 
L-1 samples of padding. Since you also have to leave some space to the 
actual input signal, you need L < N, i.e. truncate the impulse response. 
To avoid severely corrupting the spectrum by truncation, you need to 
apply a window to the wanted impulse response.

Now let's deal with the (false) statement "the filter was is defined in 
the frequency domain and only changes amplitude of frequency (not 
phase)" again. To see what you are convolving the input signal with, you 
need to take the IFFT of the filter's frequency-domain representation. 
If it is even (which is true if your filter is a zero-phase one), then 
you'll end up with something like Figure 17-1(b) from 
http://www.dspguide.com/ch17/1.htm . I.e. the filter will want to affect 
the near future and the near past equally, but the past portion will 
wrap around to the far future. Oops. To avoid this, you will need to 
shift the impulse response circularly to the right, thus making it 
linear-phase instead of zero-phase. And after that you will no longer be 
able to say that your filter does not affect phase.

And, here is what the book (http://www.dspguide.com/ch17/1.htm ) says 
about the "something": "it needs to be /shifted/, /truncated/, and 
/windowed/" - just what I explained above.

>
>
>     3. The large window size (16000 samples) is unjustified. It only leads
>     to excessively high latency. A "33 ms of audio" window and "66 ms of
>     audio" FFT size would be sufficient for any practical implementation
>     of the equalizer and would still allow one to control the attenuation
>     at 30 Hz and 60 Hz separately (as commonly found in music
>     players), all
>     with only 33 ms of latency.
>
>
> OK you can tune that if you think it's a problem... I was just 
> balancing something that worked ok for me while trying not to do too 
> many FFTs, a respectable default for the general case.

Too many short FFTs is not a problem. A long window is a problem, 
because you can't get the output sample corresponding to the first input 
sample without processing the whole window, and this means that you 
can't achieve low latency needed by games.

>
>
>     4. This latency is not properly reported (fdo bug 41465).
>
>
> Some of the harder to understand pulseaudio details like latency 
> escaped me - I know the concepts very well but I tried for hours to 
> tune it and asked Lennart countless times for help regarding this 
> (this was in the days when he was starting the hand off).  Perhaps 
> other bugs or something in PA at the time made that harder to 
> understand and work with.  If you understand how to make that part 
> work better, please experiment with it and push a patch cause it was 
> lost on me at the time.

There is indeed no useful documentation :(

Latency is how far ahead the last written sample is of what the user 
hears. It consists of three parts: latency of the parent sink, the 
amount buffered in input_q, and the algorithmic latency. See 
sink_process_msg_cb() of module-virtual-sink. You have two problems 
here: output_q (IMHO a totally unnecessary layer of buffering), and not 
counting the algorithmic latency (which is, in samples, for causal 
linear-phase filters, M + L/2, where M is the window size). I cannot 
provide a patch here, because this would need a well-defined L beforehand.

>
>
>     5. There are numerous issues with code style. A lot of commented-out
>     code, and a "FIXME: Please clean this up" that was never fixed in 4
>     years.
>
>
> I'll see what I can do about that, some of those were from 
> module-virtual-sink when Lennart was updating it, most of them were 
> completely unclear as to what should change or if changed resulted in 
> crashes (the unref one). Again, Lennart got hard to access in those 
> days and I couldn't figure out what to do with them as he put them 
> there.   The rewind stuff was also very confusing overall, I recall 
> this (outside of module-equalizer-sink) coming up before and no clear 
> answers on the horizon.  You got me on the dead code one though.  I 
> argue a few of those lines should be left as they were frequently 
> useful in developing the filter or diagnosing something - perhaps I 
> should guard them in a debug macro instead, rather than delete or 
> leave them commented?

There is pa_log_debug().

>
>     6. There are memory management issues. E.g. the
>     already-pointed-out leak
>     at the end of sink_input_pop_cb(), and the fact that a buffer for the
>     FFT is allocated by fftw_malloc() but freed by free().
>
>
> Interesting, rather than complain I would've just made a patch but 
> point taken.  I'll try to submit a patch for that soon but my time for 
> PA efforts is pretty small.

I would have submitted the patch if this was the only problem and if the 
previous person also fixed the unref instead of adding a FIXME.

>
>     7. The use of the Hanning window is unjustified. A rectangular window
>     works just as well for the purpose of overlap-add based
>     convolution, but
>     allows one to do less FFTs, i.e. go faster. In fact, the DSP Guide
>     book
>     does not even mention the variant of overlap-add with a
>     non-rectangular
>     window!
>
>
> Here's a few links that might answer it,  I'll also mention most of my 
> work was derived from these:
> http://www.dsprelated.com/dspbooks/sasp/COLA_Examples.html
> http://www.dsprelated.com/dspbooks/sasp/Frequency_Domain_COLA_Constraints.html

Unfortunately, as explained above, COLA is not the only thing required. 
Sufficient padding is required too, if modification of spectrum is needed.

> http://www.dspguide.com/

Sorry, please link to concrete places, not the whole book,

And I have one more link for you to see:

https://ccrma.stanford.edu/~jos/sasp/Overlap_Add_Decomposition.html

The last sentence on the page explicitly recommends a rectangular window 
for the purpose of FIR filtering. This way, you will be able to shift 
the window by its whole width between the FFTs, not by the half of its 
width, thus needing only half of the FFTs.

>
> I don't think it's so broken as you instigate it to be, but maybe I'm 
> wrong... it does work as desired (as a user) however.

Hanning window works indeed, it just means some extra (unneeded) work, 
provided that you have enough padding.

>
>
>     8. The module does not use any benefits (such as a chance to handle
>     rewinds properly) of being a native PulseAudio module.
>
>
> Please explain how to handle it and I might add it... again there was 
> alot of black magic in pulseaudio in those days wrt stuff like this.  
> I suspect any of the module-virtual-sink inheritors might benefit in 
> exactly the same way with potentially duplicated code. But I don't 
> know how numerous they are...

You are right here, module-virtual-sink contains some wrong advice, like 
resetting the filter on any rewind, even if it is possible to do better 
(i.e. to restore the past internal state of the filter).

Here is my point in more detail (knowingly exaggerated).

Writing a native PulseAudio module is hard: you need to implement 20 
callbacks, by copy-pasting the black magic, and with poor documentation. 
Writing a LADSPA plugin is much easier: only 8 callbacks (without much 
code duplication) and much better documentation, and the ability to use 
the result in PulseAudio via module-ladspa-sink. So there must be some 
gain for the whole exercise of writing a native PulseAudio module to be 
worth the pain (both for you and for PulseAudio developers, as any code 
is a liability).

So here is the potential gain.

A. The ability to report algorithmic latency properly, via 
PA_SINK_MESSAGE_GET_LATENCY. In fact, there is a hack (control output 
named "latency") in LADSPA for that, but PulseAudio doesn't use it. It 
looks easy to fix, though.

B. The ability to handle rewinds. In LADSPA, there is no way to say 
"forget the last N samples that I wrote to you". In PulseAudio, there is 
such way, and it is essential for all modules to either support that 
seamlessly (so that a call to pa_stream_write() with the last two 
parameters other than 0,0 and overwriting the previously-written samples 
with themselves is a no-op), or cleanly fall back to the low-latency 
no-rewinds mode in order for the "tsched" feature to be finally completed.

Your current module neither reports its algorithmic latency, nor fixes 
up its internal state (mostly overlap_accum) on rewind, so there is no 
actual gain, but all the pain (including the fact that I sent the 
"please remove" e-mail). As this currently looks like black magic that 
nobody explained to you, maybe it is indeed a better idea to write a 
LADSPA plugin instead of a native PulseAudio virtual sink module?

>
>
>     9. None of the known alternative "system-wide equalizer GUI" projects
>     work on top of module-equalizer-sink. Google has no external hits
>     on the
>     [EqualizedSinks dbus] search query that would find any alternative
>     GUIs
>     that use module-equalizer-sink DBus interface. Both veromix [3] and
>     pulseaudio-equalizer [4] work on top of module-ladspa-sink. And
>     they are
>     usable with video players, too, even though both are unmaintained.
>
>
> OK...?  is qpaeq a problem for you or did you miss it all together?  I 
> don't see why this was listed.

The keyword here is "external". qpaeq is a part of PulseAudio, not an 
external project.

>
>
>     10. It's more an optimization question, but I have a gut feeling that
>     overlap-save [5] would be a better fit for PulseAudio buffering model
>     than overlap-add. Namely, it would allow to abandon the need to store
>     overlap_accum. Instead, the only buffer needed would be for
>     looking back
>     at the past portions of the input signal, and we already have that
>     anyway due to the need to react to rewinds. In other words, from a
>     qualified DSP engineer, I would rather expect a rewrite of the
>     algorithm
>     instead of any attempts to improve it gradually.
>
>     In short, let's not pretend that PulseAudio has a working native
>     equalizer module. It should have never been accepted in the
>     current form
>     in the first place. A better replacement already exists in the form of
>     module-ladspa-sink + mbeq + veromix.
>
>
> Regarding overlap-save, I evaluated this and specifically show 
> overlap-add instead, however having been in 2009 or so I can't really 
> remember it so well anymore. Maybe it'll come back to me, I suspect 
> will come back when I look over the dspguide book again.

OK, I will wait. But my point stays: with overlap-save, you will not 
need overlap_accum, and thus will need no internal state to restore on 
rewinds. This makes handling rewinds much easier.

-- 
Alexander E. Patrakov