[pulseaudio-discuss] [PATCH] Remove module-equalizer-sink

Mon Mar 17 05:18:28 PDT 2014

Hi Again Alexander,

Let me first start of saying thank you for your time looking over the code
and spotting of issues.  I appreciate your knowledge and the time you've
taken to explain things.  I would hope you could limit how abrasive you are
as much of the way you have went about it didn't benefit anyone and is
unnecessary when you are have the capacity to argue clearly as well as you
do.  Further, I didn't appreciate that you didn't CC me on the initial
email, despite my contact details being present in the sourcecode, and just
took an instigatory / trollish stance.  I don't actively monitor
pulseaudio's ML, I come through it a few times a year (the last being
somewhere between Dec-Jan) to see if anything interesting comes up or a few
people I can help on forums so CCing me is the best way to reach me if that
is desired and I'm fairly response to that.

On Mon, Mar 10, 2014 at 12:50 PM, Alexander E. Patrakov
<patrakov at gmail.com>wrote:

> PulseAudio must be perfect :) and I think that this glitch is, in theory,
> fixable. After all, the past samples are stored in input_q, your module is
> notified about rewinds, and it can reprocess the past samples when needed.
>

The glitch should be  fixible, I did try my hand at it like you had
mentioned, I don't recall being able to rid of it so I didn't push it.
It's worth trying again.

>    It's worked pretty well for me.  I've seen a few gentoo people using it
>> but I know it never really reached mass usage due to the (unfortunate)
>> timing and naming of pulseaudio equalizer (the gtk ladspa script with
>> non-real-time tuning). I've liked the large spectrum of the filter to also
>> have make shift notch filters for when I'm to lazy to actually load things
>> up in matlab or audacity.
>>
>
> For notches, IIR filters would be far more appropriate than anything
> FFT-based, due to much lower algorithmic latency and, in some cases, lower
> CPU usage for the similar frequency resolution. As for the unfortunate
> timing and the confusing naming of the gtk-based script - I agree.
>

The notch filter part wasn't about finding the absolute best tool for the
job but something already opened and there and capable of the same effect
while listening to bad recording of good music (usually classical-ish
covers).

>     1. The FFT size and the window size for overlap-add are chosen
>>     inconsistently. The window size is always 16000 samples. The FFT size
>>     depends on the sample rate. Thus, with sample rates of 16 kHz or
>>     below,
>>     there is a buffer overflow (window becomes larger than FFT).
>>
>
I will parametrize things  better in the future, in particular latency.

    2. There is no attempt to ensure that the impulse response of the
>>     equalizer is short enough to fit in the difference between the FFT
>>     size
>>     and the window size (a requirement for the correct operation of the
>>     overlap-add technique). See [1] for the FFT size consideration and [2]
>>     why it is important to regularize the desired frequency response.
>>
>
Understood.

>
>>
>
> It is below the noise floor only because you have not tried a non-smooth
> frequency response (the one that you cannot draw in qpaeq without
> stretching its X11 window well beyond the screen size), and because of the
> excessive size of the window that allows for long tails of the impulse
> response. Still, as long as you don't make it absolutely sure that the
> impulse response of the filter is shorter than the FFT size minus the
> window size, you are not implementing a convolution. COLA doesn't help you
> here, it is indeed useful only for reconstruction (i.e. for audio codecs,
> i.e. for unity or constant gain), not for manipulation (i.e. filters,
> including the equalizer).
>
> I also believe that you are confused by two similarly-named, often
> discussed together, but actually different and unrelated things: (1)
> "overlap-add decomposition", which is related to short-time Fourier
> transform and which only works if the window satisfies the COLA condition,
> and (2) "overlap-add method", which is a means to precisely implement a
> convolution of a long signal with a short one, i.e. a useful trick against
> circular convolution.
>
>
For using the windowing, the main reason I brought it in was because from
what I read, it is required for time-variant filters.  Now I know the
filters are quasi-invariant but when someone is fiddling with the UI the
filter is time-variant.  Should I ignore that?

Now let's deal with the (false) statement "the filter was is defined in the
> frequency domain and only changes amplitude of frequency (not phase)"
> again. To see what you are convolving the input signal with, you need to
> take the IFFT of the filter's frequency-domain representation. If it is
> even (which is true if your filter is a zero-phase one), then you'll end up
> with something like Figure 17-1(b) from http://www.dspguide.com/ch17/1.htm. I.e. the filter will want to affect the near future and the near past
> equally, but the past portion will wrap around to the far future. Oops. To
> avoid this, you will need to shift the impulse response circularly to the
> right, thus making it linear-phase instead of zero-phase. And after that
> you will no longer be able to say that your filter does not affect phase.
>

Hmm, it looks like I've had constant phase of 1 and not linear phase.  I'll
reparameterize it in the future such that phase will not change and always
be linear.

>
>>
>>     3. The large window size (16000 samples) is unjustified. It only leads
>>     to excessively high latency. A "33 ms of audio" window and "66 ms of
>>     audio" FFT size would be sufficient for any practical implementation
>>     of the equalizer and would still allow one to control the attenuation
>>     at 30 Hz and 60 Hz separately (as commonly found in music
>>     players), all
>>     with only 33 ms of latency.
>>
>>
>> OK you can tune that if you think it's a problem... I was just balancing
>> something that worked ok for me while trying not to do too many FFTs, a
>> respectable default for the general case.
>>
>
> Too many short FFTs is not a problem. A long window is a problem, because
> you can't get the output sample corresponding to the first input sample
> without processing the whole window, and this means that you can't achieve
> low latency needed by games.

As far as I know it is a problem if you're saying to still sample at the
next power of 2 for say 44.1khz but allow the latency to go smaller.  The
length of the window choice may not even change the length of the next
power of 2 FFT but more of these per second as a result of less latency
will be felt as it's the FFTs at high enough samplerates that are - I
wanted to avoid hitting 10% CPU usage on the i7 920 I mostly tested on and
at 4k or so of latency (samples) and 96khz audio, it was definitely up
there.  But if its a tunable parameter (in ms, a reasonable default for
most can be found and individuals can then choose what amount they think is
allowable; it's also been almost 5 years.  Or were you talking about also
using smaller FFTs and losing resolution - if so, even if that's common
practice... I don't like that and am unfamiliar with it but if you have
some literature to point to, I will take a look.

>
>
>
>>
>>     4. This latency is not properly reported (fdo bug 41465).
>>
>>
>> Some of the harder to understand pulseaudio details like latency escaped
>> me - I know the concepts very well but I tried for hours to tune it and
>> asked Lennart countless times for help regarding this (this was in the days
>> when he was starting the hand off).  Perhaps other bugs or something in PA
>> at the time made that harder to understand and work with.  If you
>> understand how to make that part work better, please experiment with it and
>> push a patch cause it was lost on me at the time.
>>
>
> There is indeed no useful documentation :(
>
> Latency is how far ahead the last written sample is of what the user
> hears. It consists of three parts: latency of the parent sink, the amount
> buffered in input_q, and the algorithmic latency. See sink_process_msg_cb()
> of module-virtual-sink. You have two problems here: output_q (IMHO a
> totally unnecessary layer of buffering), and not counting the algorithmic
> latency (which is, in samples, for causal linear-phase filters, M + L/2,
> where M is the window size). I cannot provide a patch here, because this
> would need a well-defined L beforehand.
>

Re output_q: the filter output has to be chunked into segments which have
size constraints to continue through the PA pipeline.  It is dequeued in
accordance with the sink_input_pop_cb which only takes one memchunk at a
time.  I once tried to do this all in one big chunk originally but pa core
complained about that due to the memchunk size limitations.     Do you
think it should be discarded from the calculation still?  Also, input_q and
algorithmic latency seems a bit muddled.  Perhaps the right thing to do is
to round up how much input_q is by algorithmic latency.

>
>>
>>     5. There are numerous issues with code style. A lot of commented-out
>>     code, and a "FIXME: Please clean this up" that was never fixed in 4
>>     years.
>>
>>
>> I'll see what I can do about that, some of those were from
>> module-virtual-sink when Lennart was updating it, most of them were
>> completely unclear as to what should change or if changed resulted in
>> crashes (the unref one). Again, Lennart got hard to access in those days
>> and I couldn't figure out what to do with them as he put them there.   The
>> rewind stuff was also very confusing overall, I recall this (outside of
>> module-equalizer-sink) coming up before and no clear answers on the
>> horizon.  You got me on the dead code one though.  I argue a few of those
>> lines should be left as they were frequently useful in developing the
>> filter or diagnosing something - perhaps I should guard them in a debug
>> macro instead, rather than delete or leave them commented?
>>
>
> There is pa_log_debug().

Too much stuff for that... it'll be heavy enough to discouraging to use
pa_log_debug.

>
>
>
>>     6. There are memory management issues. E.g. the
>>     already-pointed-out leak
>>     at the end of sink_input_pop_cb(), and the fact that a buffer for the
>>     FFT is allocated by fftw_malloc() but freed by free().
>>
>>
>> Interesting, rather than complain I would've just made a patch but point
>> taken.  I'll try to submit a patch for that soon but my time for PA efforts
>> is pretty small.
>>
>
> I would have submitted the patch if this was the only problem and if the
> previous person also fixed the unref instead of adding a FIXME.

Regarding that unref, that's something I tried back upon it's creation and
it just ends up crashing due to that memblock being free'd.  If it is a
resource leak still I haven't seen it takes its toll yet.

>
>
>>     7. The use of the Hanning window is unjustified. A rectangular window
>>     works just as well for the purpose of overlap-add based
>>     convolution, but
>>     allows one to do less FFTs, i.e. go faster. In fact, the DSP Guide
>>     book
>>     does not even mention the variant of overlap-add with a
>>     non-rectangular
>>     window!
>>
> And I have one more link for you to see:
>
> https://ccrma.stanford.edu/~jos/sasp/Overlap_Add_Decomposition.html
>
> The last sentence on the page explicitly recommends a rectangular window
> for the purpose of FIR filtering. This way, you will be able to shift the
> window by its whole width between the FFTs, not by the half of its width,
> thus needing only half of the FFTs.

As I mentioned earlier, I used the window approach to deal with time
varying filters.

>
>
>
>> I don't think it's so broken as you instigate it to be, but maybe I'm
>> wrong... it does work as desired (as a user) however.
>>
>
> Hanning window works indeed, it just means some extra (unneeded) work,
> provided that you have enough padding.
>
>
>
>>
>>     8. The module does not use any benefits (such as a chance to handle
>>     rewinds properly) of being a native PulseAudio module.
>>
>>
>> Please explain how to handle it and I might add it... again there was
>> alot of black magic in pulseaudio in those days wrt stuff like this.  I
>> suspect any of the module-virtual-sink inheritors might benefit in exactly
>> the same way with potentially duplicated code. But I don't know how
>> numerous they are...
>>
>
> You are right here, module-virtual-sink contains some wrong advice, like
> resetting the filter on any rewind, even if it is possible to do better
> (i.e. to restore the past internal state of the filter).
>
> Here is my point in more detail (knowingly exaggerated).
>
> Writing a native PulseAudio module is hard: you need to implement 20
> callbacks, by copy-pasting the black magic, and with poor documentation.
> Writing a LADSPA plugin is much easier: only 8 callbacks (without much code
> duplication) and much better documentation, and the ability to use the
> result in PulseAudio via module-ladspa-sink. So there must be some gain for
> the whole exercise of writing a native PulseAudio module to be worth the
> pain (both for you and for PulseAudio developers, as any code is a
> liability).
>
>
I think the real problem here is DRY over the modules that are based or are
similar to module-virtual-sink.  Many of those callbacks aren't really
'overridden' in a given implementation.  The cost of implementation and
maintenance of one of these sinks shouldn't be that excessive as most of
the changes would then be made in fewer places given a better way of
inheriting from that common base.  I think ladspa also forces a weird
external (plugin) dependency that could be troublesome for some people and
it also sounds linux specific where as pulse aspires to be cross platform.
Further/in general I feel better integration between algorithm and PA is
possible only without the use of a prepackaged library set like ladspa's.
And the DSP code shouldn't not be that hard to write - even if my
implementation was/is incorrect, the core region of it is about 50 lines.
Would you bring in  blas to add 2 vectors if that was all you were going to
do?

>
>>
>>     10. It's more an optimization question, but I have a gut feeling that
>>     overlap-save [5] would be a better fit for PulseAudio buffering model
>>     than overlap-add. Namely, it would allow to abandon the need to store
>>     overlap_accum. Instead, the only buffer needed would be for
>>     looking back
>>     at the past portions of the input signal, and we already have that
>>     anyway due to the need to react to rewinds. In other words, from a
>>     qualified DSP engineer, I would rather expect a rewrite of the
>>     algorithm
>>     instead of any attempts to improve it gradually.
>>
>> Yea I guess I don't see it as big of a deal as you regarding rewrites of
algorithms or improving things gradually.  But it will take some time,
which I currently don't have for more than a few weeks before I can really
tear into the meat of it.  Regarding overlap-save, it does look like it
would be a tiny bit cheaper to use by forgoing adds, I forsee almost zero
speedup but it will be less operations.  I like that approach with the
rewinds though but I'll have to see if it has any problems.  As for why
initially chose OLA rather than OLS, I thought there was a way I could
cheat on worst-case latency with certain compromises , perhaps more
examples it of it also had a hand in it.

-Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/pulseaudio-discuss/attachments/20140317/5e24bc2f/attachment-0001.html>