[pulseaudio-discuss] [PATCH v4] Make module loopback honor requested latency

Fri Feb 6 01:56:28 PST 2015

On 06.02.2015 09:42, Georg Chini wrote:
> On 06.02.2015 08:17, Alexander E. Patrakov wrote:
>> First of all, thanks for a quick and detailed answer.
>>
>> 06.02.2015 02:02, Georg Chini wrote:
>>> On 05.02.2015 16:59, Alexander E. Patrakov wrote:
>>>> 01.02.2015 03:43, Georg Chini wrote:
>>>>> This is the final version of my patch for module-loopback. It is on
>>>>> top of the
>>>>> patch I sent about an hour ago and contains a lot more changes than
>>>>> the previous
>>>>> versions:
>>>>>
>>>>> - Honor specified latency if possible, if not adjust to the lowest
>>>>> possible value
>>>>> - Smooth switching from fixed latency to dynamic latency source or
>>>>> sink and vice versa
>>>>> - good rate and latency stability, no rate oscillation
>>>>> - adjusts latency as good as your setup allows
>>>>> - fast regulation of latency offsets, adjusts 100 ms offset within 22
>>>>> seconds (adjust
>>>>>    time=1) to 60 seconds (adjust_time=10)
>>>>> - usable latency range 4 - 30000 ms
>>>>> - Avoid rewinds and "cannot peek into queue" messages during startup
>>>>> and switching
>>>>> - works with rates between 200 and 190000 Hz
>>>>> - maximum latency offset after source/sink switch or at startup
>>>>> around is 200 ms
>>>>>
>>>>> I also introduced a new parameter, buffer_latency_msec which can be
>>>>> used together
>>>>> with latency_msec. If buffer_latency_msec is specified, the resulting
>>>>> latency
>>>>> will be latency_msec + buffer_latency_msec. Latency_msec then refers
>>>>> only to
>>>>> the source/sink latency while buffer_latency_msec specifies the
>>>>> buffer part.
>>>>> This can be used to save a lot of CPU at low latencies, running 10 ms
>>>>> latency
>>>>> with latency_msec=6 buffer_latency_msec=4 gives 8% CPU on my system
>>>>> compared to
>>>>> 12% when I only specify latency_msec=10.
>>>>
>>>> Is there any more detailed profiling data for it? E.g., have you tried
>>>> running perf record / perf report?
>>>>
>>>
>>> No, I haven't - I just compared CPU for the current module and my
>>> patched version. I did not
>>> change the way audio is processed so I would not expect large
>>> differences to the current version
>>> as long as you only use latency_msec.
>>> CPU usage mainly depends on the latency set for sink and source and
>>> splitting the latency into
>>> buffer and source/sink latency aims at that.
>>> I can provide figures if you would like to see them, but I do not know
>>> how to use the tools you
>>> mention above. Is there a HowTo somewhere?
>>
>> In the kernel source, there is a tools/perf directory. That's where 
>> you build perf from. It has a Documentation subdirectory, where you 
>> can read perf-record.txt and perf-report.txt.
>>
>> You can also read 
>> https://www.mail-archive.com/pulseaudio-discuss@lists.freedesktop.org/msg11469.html 
>> , that's how I ran perf on PulseAudio, and some replies in the thread 
>> contain improvements to my methodology.
>
> Thanks, I'll take a look.
>
>>
>>>
>>>> I am not quoting the patch, but my opinion is that it needs either
>>>> comments or some text in the commit message on the following topics:
>>>>
>>>
>>> Mh, I thought I already put a lot of comments in ...
>>>
>>>> Why does it use hand-crafted heuristics instead of a PID-controller?
>>>> Or, is this "Low pass filtered difference between expectation value
>>>> and observed latency" a PI controller in disguise?
>>>>
>>>
>>> What do you mean by "handcrafted heuristics"? It's basic math - how 
>>> many
>>> samples do I need
>>> for the requested latency. The equation new_rate = base_rate * (1 +
>>> latency_difference / adjust_time)
>>> is a simple P-regulator without I or D part, rate difference is
>>> proportional to latency difference.
>>
>> Yes, it is a P regulator (and you address the "heuristics" below in 
>> your email as "additional restrictions", sorry for not noticing the 
>> regulator behind them), and your explanation for choosing the 
>> coefficient is quoted below.
>>
>>> There were two options - either to get the latency right as quickly as
>>> possible - that would be
>>> base_rate * (1 + latency_difference / (adjust_time + latency)) - or to
>>> get the fastest convergence
>>> to the latency at the expected rate. I choose the second option.
>>
>> With latency << adjust_time, I do not treat these two possibilities 
>> as significantly different options
>
> Only correct for small latencies, if you specify latency = 30000 ms 
> and adjust time = 1000 ms this
> obviously not applies, so for large latencies you see a significant 
> difference between the two.
>
>
>> .
>>
>> The module makes the decision about the new rate once in adjust_time 
>> seconds. The equation that you quoted (either one) means that you aim 
>> for the latency difference to be fully corrected just at the next 
>> time your module makes a decision. I am not 100% sure that it is the 
>> optimal strategy, especially since in the text below you talk about 
>> too-sensitive controller and even instability/oscillation.
>
> That is not totally correct. Due to min_cycles, the controller does 
> not aim for 100% correction but
> for a (slightly) lower value, which seems to be enough to avoid 
> overshoot. Also I do not try to
> correct the whole latency difference in one step, this is only the 
> case when you are near the
> expected latency so that min_cycles is nearly 1. Else I am only 
> correcting what can be corrected
> within one step.
>
>>
>>> I experimented with adding I- or D-parts but did not see any 
>>> significant
>>> improvement (maybe
>>> because of incorrect parameter values).
>>
>> I will try to look into this on Sunday, then. But this first means 
>> getting the P-part right.
>>
>> Here is what I think here, but not yet 100% sure of.
>>
>> If you use this (obviously wrong) equation:
>>
>> new_rate = base_rate * (1 + 2 * latency_difference / adjust_time)
>>
>> then it will overcorrect the latency difference, so that it becomes 
>> exactly the negatine of what it was before. And so on and so on, i.e. 
>> there will be oscillations. With values of the factor before 
>> latency_difference between 1 and 2, overcorrection will also happen, 
>> but its absolute value will be reduced each time. So, 2 seems to be 
>> the critical value that leads to instability, and the period of 
>> oscillation is 2 * adjust_time.
>>
>> So, according to the Ziegler-Nichols method, for a P-only controller, 
>> you have indeed chosen the correct value of the P term (i.e. half of 
>> what would cause self-oscillation). But it is not correct for a 
>> general PID controller. See 
>> https://en.wikipedia.org/wiki/Ziegler%E2%80%93Nichols_method - for a 
>> PID controller with no overshoot, the P term would be:
>>
>> base_rate * (1 + 0.4 * latency_difference / adjust_time)
>
> There is no (or only a very small) overshoot visible. Due to the fact 
> that in the debug log I print
> out the current latency at the current rate it does look like an 
> overshoot especially for long
> latencies, but if you take the corrected latency instead, you won't 
> see it there. I'll take a look at
> the link you provided for more theoretical background.
> The big problem here was to make sure, that when approaching the 
> correct latency you will not
> overcorrect due to restriction 2), the rate difference for each step 
> must be so that you will not
> hit the limit. That is where the min_cycles parameter comes in again, 
> if you are near the
> expected latency, the regulator will behave nearly like a pure 
> p-regulator while it saturates when
> the latency is further off.
>
>>
>>>
>>> There are however two additional restrictions:
>>> 1) Do not deviate too much (I choose 0.75%) from the base rate
>>> 2) Do the adjustment in small amounts at a time
>>> The first restriction was solved by introducing min_cycles, which makes
>>> sure the rate cannot
>>> deviate more than 0.75% from the base rate. It also produces some
>>> dampening of the P-regulator
>>> and smooths out the steps produced by the second restriction (at least
>>> when you are approaching
>>> the base rate).
>>> For 2) I just copied what was already in the current code.
>>
>> OK, understood.
>>
>>>
>>> And then I was looking for the right criterion when  to stop 
>>> regulation.
>>> When is the latency
>>> "good enough"? Using some percentage of the latency will not work well
>>> when you have a
>>> latency range of 4 - 30000 ms to cover (at varying rates). Just cutting
>>> off when you are a few
>>> Hz away from the base rate like the current code does is also not a 
>>> good
>>> idea for the same
>>> reason (100 - 190000 Hz).
>>> The best you can get is an error of adjust_time / (2 * base_rate). But
>>> this does not work either
>>> because the latency measurement itself has some error and there is some
>>> jitter in the latency.
>>> This easily leads to instable rates or oscillation because the
>>> controller follows every tiny change.
>>
>> Looks like a complaint about a badly tuned controller, or with the 
>> additional restrictions interfering with the stability. My own 
>> opinion is that, with a properly tuned controller, and with 
>> restriction (2) removed or relaxed, a stop criterion should not be 
>> needed at all. I'll test the theory on Sunday.
>>
>
> Thanks for looking into this. You are right, if conditions are nearly 
> ideal (like when you only use
> an Intel HDA sound card) the stop criterion is not needed. But as 
> stated above, conditions are
> unfortunately often far from ideal. First the way the latency 
> snapshots are retrieved allows for some
> error, second you will see latency jumps of several ms with fixed 
> latency (interrupt driven) devices
> and third I noticed that (for some USB/bluetooth devices) you will see 
> overshoot that is NOT related
> to the regulator but seems to be a hardware or driver related problem.
> Removing or relaxing restriction 2) might be audible, so I was very 
> reluctant to change it.
> The stop criterion I introduced makes sure that the rate remains quite 
> stable even if the latency is
> jumping around wildly for whatever reason.

One more thing: There is a systematic error in the adjust_time I could 
not work around without
introducing too much overhead. The latency snapshot varies widely in the 
execution time, I
measured values between 50 us and more than 60 ms. So if the extreme 
values follow each other
you will have one adjust time that is around 60 ms too long and another 
one which is 60 ms too
short. Maybe this also contributes significantly to the (in)stability of 
regulation.