<div dir="ltr"><div>Hi Again Alexander,<br><br></div>Let me first start of saying thank you for your time looking over the code and spotting of issues. I appreciate your knowledge and the time you've taken to explain things. I would hope you could limit how abrasive you are as much of the way you have went about it didn't benefit anyone and is unnecessary when you are have the capacity to argue clearly as well as you do. Further, I didn't appreciate that you didn't CC me on the initial email, despite my contact details being present in the sourcecode, and just took an instigatory / trollish stance. I don't actively monitor pulseaudio's ML, I come through it a few times a year (the last being somewhere between Dec-Jan) to see if anything interesting comes up or a few people I can help on forums so CCing me is the best way to reach me if that is desired and I'm fairly response to that.<br>
<div><br><div><div class="gmail_extra"><div class="gmail_quote">On Mon, Mar 10, 2014 at 12:50 PM, Alexander E. Patrakov <span dir="ltr"><<a href="mailto:patrakov@gmail.com" target="_blank">patrakov@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">PulseAudio must be perfect :) and I think that this glitch is, in theory, fixable. After all, the past samples are stored in input_q, your module is notified about rewinds, and it can reprocess the past samples when needed.<br>
<div class=""></div></blockquote><div><br></div><div>The glitch should be fixible, I did try my hand at it like you had mentioned, I don't recall being able to rid of it so I didn't push it. It's worth trying again.<br>
<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
It's worked pretty well for me. I've seen a few gentoo people using it but I know it never really reached mass usage due to the (unfortunate) timing and naming of pulseaudio equalizer (the gtk ladspa script with non-real-time tuning). I've liked the large spectrum of the filter to also have make shift notch filters for when I'm to lazy to actually load things up in matlab or audacity.<br>
</blockquote>
<br></div>
For notches, IIR filters would be far more appropriate than anything FFT-based, due to much lower algorithmic latency and, in some cases, lower CPU usage for the similar frequency resolution. As for the unfortunate timing and the confusing naming of the gtk-based script - I agree.<br>
</blockquote><div><br>The notch filter part wasn't about finding the absolute best tool for the job but something already opened and there and capable of the same effect while listening to bad recording of good music (usually classical-ish covers). <br>
<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class="">
1. The FFT size and the window size for overlap-add are chosen<br>
inconsistently. The window size is always 16000 samples. The FFT size<br>
depends on the sample rate. Thus, with sample rates of 16 kHz or<br>
below,<br>
there is a buffer overflow (window becomes larger than FFT).<br></div></blockquote></blockquote><div><br></div><div>I will parametrize things better in the future, in particular latency.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""></div></blockquote><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
2. There is no attempt to ensure that the impulse response of the<br>
equalizer is short enough to fit in the difference between the FFT<br>
size<br>
and the window size (a requirement for the correct operation of the<br>
overlap-add technique). See [1] for the FFT size consideration and [2]<br>
why it is important to regularize the desired frequency response.<br>
</blockquote></div></blockquote><div><br>Understood.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
</blockquote>
<br></div>
It is below the noise floor only because you have not tried a non-smooth frequency response (the one that you cannot draw in qpaeq without stretching its X11 window well beyond the screen size), and because of the excessive size of the window that allows for long tails of the impulse response. Still, as long as you don't make it absolutely sure that the impulse response of the filter is shorter than the FFT size minus the window size, you are not implementing a convolution. COLA doesn't help you here, it is indeed useful only for reconstruction (i.e. for audio codecs, i.e. for unity or constant gain), not for manipulation (i.e. filters, including the equalizer).<br>
<br>
I also believe that you are confused by two similarly-named, often discussed together, but actually different and unrelated things: (1) "overlap-add decomposition", which is related to short-time Fourier transform and which only works if the window satisfies the COLA condition, and (2) "overlap-add method", which is a means to precisely implement a convolution of a long signal with a short one, i.e. a useful trick against circular convolution.<br>
<br></blockquote><div><br>For using the windowing, the main reason I brought it in was because from what I read, it is required for time-variant filters. Now I know the filters are quasi-invariant but when someone is fiddling with the UI the filter is time-variant. Should I ignore that?<br>
<br><br></div><blockquote style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex" class="gmail_quote">Now let's deal with the (false) statement "the filter was is defined in the frequency domain and only changes amplitude of frequency (not phase)" again. To see what you are convolving the input signal with, you need to take the IFFT of the filter's frequency-domain representation. If it is even (which is true if your filter is a zero-phase one), then you'll end up with something like Figure 17-1(b) from <a href="http://www.dspguide.com/ch17/1.htm" target="_blank">http://www.dspguide.com/ch17/1.htm</a> . I.e. the filter will want to affect the near future and the near past equally, but the past portion will wrap around to the far future. Oops. To avoid this, you will need to shift the impulse response circularly to the right, thus making it linear-phase instead of zero-phase. And after that you will no longer be able to say that your filter does not affect phase.<br>
</blockquote><br></div><div class="gmail_quote">Hmm, it looks like I've had constant phase of 1 and not linear phase. I'll reparameterize it in the future such that phase will not change and always be linear.<br>
</div><div class="gmail_quote"><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
3. The large window size (16000 samples) is unjustified. It only leads<br>
to excessively high latency. A "33 ms of audio" window and "66 ms of<br>
audio" FFT size would be sufficient for any practical implementation<br>
of the equalizer and would still allow one to control the attenuation<br>
at 30 Hz and 60 Hz separately (as commonly found in music<br>
players), all<br>
with only 33 ms of latency.<br>
<br>
<br>
OK you can tune that if you think it's a problem... I was just balancing something that worked ok for me while trying not to do too many FFTs, a respectable default for the general case.<br>
</blockquote>
<br></div>
Too many short FFTs is not a problem. A long window is a problem, because you can't get the output sample corresponding to the first input sample without processing the whole window, and this means that you can't achieve low latency needed by games.</blockquote>
<div><br></div><div>As far as I know it is a problem if you're saying to still sample at the next power of 2 for say 44.1khz but allow the latency to go smaller. The length of the window choice may not even change the length of the next power of 2 FFT but more of these per second as a result of less latency will be felt as it's the FFTs at high enough samplerates that are - I wanted to avoid hitting 10% CPU usage on the i7 920 I mostly tested on and at 4k or so of latency (samples) and 96khz audio, it was definitely up there. But if its a tunable parameter (in ms, a reasonable default for most can be found and individuals can then choose what amount they think is allowable; it's also been almost 5 years. Or were you talking about also using smaller FFTs and losing resolution - if so, even if that's common practice... I don't like that and am unfamiliar with it but if you have some literature to point to, I will take a look.<br>
</div><div> </div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
4. This latency is not properly reported (fdo bug 41465).<br>
<br>
<br>
Some of the harder to understand pulseaudio details like latency escaped me - I know the concepts very well but I tried for hours to tune it and asked Lennart countless times for help regarding this (this was in the days when he was starting the hand off). Perhaps other bugs or something in PA at the time made that harder to understand and work with. If you understand how to make that part work better, please experiment with it and push a patch cause it was lost on me at the time.<br>
</blockquote>
<br></div>
There is indeed no useful documentation :(<br>
<br>
Latency is how far ahead the last written sample is of what the user hears. It consists of three parts: latency of the parent sink, the amount buffered in input_q, and the algorithmic latency. See sink_process_msg_cb() of module-virtual-sink. You have two problems here: output_q (IMHO a totally unnecessary layer of buffering), and not counting the algorithmic latency (which is, in samples, for causal linear-phase filters, M + L/2, where M is the window size). I cannot provide a patch here, because this would need a well-defined L beforehand.<div class="">
</div></blockquote><div class=""> <br>Re output_q: the filter output has to be chunked into segments which have size constraints to continue through the PA pipeline. It is dequeued in accordance with the sink_input_pop_cb which only takes one memchunk at a time. I once tried to do this all in one big chunk originally but pa core complained about that due to the memchunk size limitations. Do you think it should be discarded from the calculation still? Also, input_q and algorithmic latency seems a bit muddled. Perhaps the right thing to do is to round up how much input_q is by algorithmic latency.<br>
<br></div><div class=""><br></div><div class=""><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
5. There are numerous issues with code style. A lot of commented-out<br>
code, and a "FIXME: Please clean this up" that was never fixed in 4<br>
years.<br>
<br>
<br>
I'll see what I can do about that, some of those were from module-virtual-sink when Lennart was updating it, most of them were completely unclear as to what should change or if changed resulted in crashes (the unref one). Again, Lennart got hard to access in those days and I couldn't figure out what to do with them as he put them there. The rewind stuff was also very confusing overall, I recall this (outside of module-equalizer-sink) coming up before and no clear answers on the horizon. You got me on the dead code one though. I argue a few of those lines should be left as they were frequently useful in developing the filter or diagnosing something - perhaps I should guard them in a debug macro instead, rather than delete or leave them commented?<br>
</blockquote>
<br></div>
There is pa_log_debug().</blockquote><div><br></div><div>Too much stuff for that... it'll be heavy enough to discouraging to use pa_log_debug.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
6. There are memory management issues. E.g. the<br>
already-pointed-out leak<br>
at the end of sink_input_pop_cb(), and the fact that a buffer for the<br>
FFT is allocated by fftw_malloc() but freed by free().<br>
<br>
<br>
Interesting, rather than complain I would've just made a patch but point taken. I'll try to submit a patch for that soon but my time for PA efforts is pretty small.<br>
</blockquote>
<br></div>
I would have submitted the patch if this was the only problem and if the previous person also fixed the unref instead of adding a FIXME.</blockquote><br>Regarding that unref, that's something I tried back upon it's creation and it just ends up crashing due to that memblock being free'd. If it is a resource leak still I haven't seen it takes its toll yet.<br>
<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
7. The use of the Hanning window is unjustified. A rectangular window<br>
works just as well for the purpose of overlap-add based<br>
convolution, but<br>
allows one to do less FFTs, i.e. go faster. In fact, the DSP Guide<br>
book<br>
does not even mention the variant of overlap-add with a<br>
non-rectangular<br>
window!<br></blockquote></div>
And I have one more link for you to see:<br>
<br>
<a href="https://ccrma.stanford.edu/~jos/sasp/Overlap_Add_Decomposition.html" target="_blank">https://ccrma.stanford.edu/~<u></u>jos/sasp/Overlap_Add_<u></u>Decomposition.html</a><br>
<br>
The last sentence on the page explicitly recommends a rectangular window for the purpose of FIR filtering. This way, you will be able to shift the window by its whole width between the FFTs, not by the half of its width, thus needing only half of the FFTs.</blockquote>
<div><br></div><div>As I mentioned earlier, I used the window approach to deal with time varying filters.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
I don't think it's so broken as you instigate it to be, but maybe I'm wrong... it does work as desired (as a user) however.<br>
</blockquote>
<br></div>
Hanning window works indeed, it just means some extra (unneeded) work, provided that you have enough padding.<div class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
8. The module does not use any benefits (such as a chance to handle<br>
rewinds properly) of being a native PulseAudio module.<br>
<br>
<br>
Please explain how to handle it and I might add it... again there was alot of black magic in pulseaudio in those days wrt stuff like this. I suspect any of the module-virtual-sink inheritors might benefit in exactly the same way with potentially duplicated code. But I don't know how numerous they are...<br>
</blockquote>
<br></div>
You are right here, module-virtual-sink contains some wrong advice, like resetting the filter on any rewind, even if it is possible to do better (i.e. to restore the past internal state of the filter).<br>
<br>
Here is my point in more detail (knowingly exaggerated).<br>
<br>
Writing a native PulseAudio module is hard: you need to implement 20 callbacks, by copy-pasting the black magic, and with poor documentation. Writing a LADSPA plugin is much easier: only 8 callbacks (without much code duplication) and much better documentation, and the ability to use the result in PulseAudio via module-ladspa-sink. So there must be some gain for the whole exercise of writing a native PulseAudio module to be worth the pain (both for you and for PulseAudio developers, as any code is a liability).<br>
<br></blockquote><div><br>I think the real problem here is DRY over the modules that are based or are similar to module-virtual-sink. Many of those callbacks aren't really 'overridden' in a given implementation. The cost of implementation and maintenance of one of these sinks shouldn't be that excessive as most of the changes would then be made in fewer places given a better way of inheriting from that common base. I think ladspa also forces a weird external (plugin) dependency that could be troublesome for some people and it also sounds linux specific where as pulse aspires to be cross platform. Further/in general I feel better integration between algorithm and PA is possible only without the use of a prepackaged library set like ladspa's. And the DSP code shouldn't not be that hard to write - even if my implementation was/is incorrect, the core region of it is about 50 lines. Would you bring in blas to add 2 vectors if that was all you were going to do?<br>
</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<br>
10. It's more an optimization question, but I have a gut feeling that<br>
overlap-save [5] would be a better fit for PulseAudio buffering model<br>
than overlap-add. Namely, it would allow to abandon the need to store<br>
overlap_accum. Instead, the only buffer needed would be for<br>
looking back<br>
at the past portions of the input signal, and we already have that<br>
anyway due to the need to react to rewinds. In other words, from a<br>
qualified DSP engineer, I would rather expect a rewrite of the<br>
algorithm<br>
instead of any attempts to improve it gradually.<br>
<br></blockquote></div></blockquote><div>Yea I guess I don't see it as big of a deal as you regarding rewrites of algorithms or improving things gradually. But it will take some time, which I currently don't have for more than a few weeks before I can really tear into the meat of it. Regarding overlap-save, it does look like it would be a tiny bit cheaper to use by forgoing adds, I forsee almost zero speedup but it will be less operations. I like that approach with the rewinds though but I'll have to see if it has any problems. As for why initially chose OLA rather than OLS, I thought there was a way I could cheat on worst-case latency with certain compromises , perhaps more examples it of it also had a hand in it.<br>
</div><div><br><br></div><div>-Jason<br></div></div></div></div></div></div>