[pulseaudio-discuss] alsa sink latency - how to account for startup delay

Wed Apr 6 13:55:27 UTC 2016

On Tue, 2016-04-05 at 21:54 +0200, Georg Chini wrote:
> On 05.04.2016 17:42, Tanu Kaskinen wrote:
> > 
> > On Sat, 2016-04-02 at 18:13 +0200, Georg Chini wrote:
> > > 
> > > On 02.04.2016 16:10, Tanu Kaskinen wrote:
> > > > 
> > > > On Wed, 2016-03-30 at 22:31 +0200, Georg Chini wrote:
> > > > > 
> > > > > On 30.03.2016 18:06, Tanu Kaskinen wrote:
> > > > > > 
> > > > > > On Tue, 2016-03-29 at 20:29 +0200, Georg Chini wrote:
> > > > > > > 
> > > > > > > 5) The pulseaudio sink code takes the first 10ms of audio out of the
> > > > > > > loopback buffer,
> > > > > > > writes it to the alsa buffer and calls snd_pcm_start().
> > > > > > If the sink takes something from the loopback buffer, this means that
> > > > > > the first pop() call has been made. Assuming no time has passed since
> > > > > > the previous step, the USB bus is still full, and so is the ring
> > > > > > buffer. Expected delay: 20 ms.
> > > > > Reported delay is exactly the amount of audio that was written to
> > > > > the buffer.
> > > > That's the bug that I think should be fixed in alsa if possible (and if
> > > > it's impossible, I don't see how it could be fixed in pulseaudio
> > > > either).
> > > It can be fixed (or at least be worked around). If you take a time stamp
> > > at the moment when snd_pcm_start() is called and another when
> > > the first audio has definitely been played (delay < write_count), then
> > > the difference between the time stamps corrected by the amount
> > > of audio that has already been played, gives you exactly that
> > > missing bit of latency.
> > I can't follow that line of reasoning. In the beginning the ring buffer
> > is filled to max, and once you call snd_pcm_start(), data starts to
> > move from the ring buffer to other buffers (I'll call the other buffers
> > the "not-ring-buffer"). Apparently the driver "sees" the not-ring-
> > buffer only partially, since it reports a larger latency than just the
> > ring buffer fill level, but it still doesn't report the full latency.
> > The time between snd_pcm_start() and the point where the reported delay
> > does not any more equal the written amount tells the size of the
> > visible part of the not-ring-buffer - it's the time it took for the
> > first sample to travel from the ring buffer to the invisible part of
> > the not-ring-buffer. I don't understand how the time could say anything
> > about the size of the invisible part of the not-ring-buffer. Your logic
> > "works" only if the visible and invisible parts happen to be of the
> > same size.
> > 
> > You should get the same results by calculating
> > 
> >    adjusted delay = ring buffer fill level + 2 * (reported delay - ring buffer fill level)
> > 
> > That formula doesn't make sense, but that's how I understand your logic
> > works, with the difference that your fix is based on one measurement
> > only, so it's constant over time, while my formula recalculates the
> > adjustment every time the delay is queried, so the adjustment size
> > varies somewhat depending on the granularity at which audio moves to
> > and from the visible part of the not-ring-buffer.
> > 
> > In any case, even if your logic actually makes sense and I'm just
> > misunderstanding something, I don't see why the correction should be
> > done in pulseaudio instead of the alsa driver.
> Well, now I don't understand what you mean. The logic is very simple:
> If there is a not reported delay between the time snd_pcm_start() is
> called and the time when the first sample is delivered to the DAC, then
> this delay will persist and become part of the continuous latency.
> That's all, what causes the delay is completely irrelevant.

The code can't know when the first sample hits the DAC. The delay
reported by alsa is supposed to tell that, but if the reported delay is
wrong, I don't think you have any way to know the real delay.

> Maybe what I said above was not complete. At the point in time when
> the first audio is played, there are two delays: First the one that is 
> reported
> by alsa and the other is the difference between the time stamps minus
> the played audio. If these two delays don't match, then there is an
> "extra delay" that has to be taken into account.

The difference between the time stamps is not related to how big the
invisible part of the buffer is. I'll try to illustrate:

In the beginning, pulseaudio has written 10 ms of audio to the ring
buffer, and snd_pcm_start() hasn't been called:

DAC <- ssssssssss|sss|dddddddddd <- pulseaudio

Here "ssssssssss|sss|ddddddddd" is the whole buffer between the DAC and
pulseaudio. It's divided into three parts; the pipe characters separate
the different parts. Each letter represents 1 ms of data. "s" stands
for silence and "d" stands for data. The first part of the buffer is
the invisible part that is not included in the delay reports. I've put
10 ms of data there, but it's unknown to the driver how big the
invisible part is. The middle part of the buffer is the "send buffer"
that the driver maintains, its size is 3 ms in this example. It's
filled with silence in the beginning. The third part is the ring
buffer, containing 10 ms of data from pulseaudio.

At this point the driver reports 10 ms latency. It knows it has 3 ms of
silence buffered too, which it should include in its latency report,
but it's stupid, so it only reports the data in the ring buffer. The
driver has no idea how big the invisible part is, so it doesn't include
it in the report.

Now pulseaudio calls snd_pcm_start(), which causes data to start moving
from the ring buffer to the send buffer. After 1 ms the situation looks
like this:

DAC <- ssssssssss|ssd|ddddddddd  <- pulseaudio

There's 2 ms of silence in the send buffer and 1 ms of data. The driver
again ignores the silence in the send buffer, and reports that the
delay is 10 ms, which consists of 1 ms of data in the send buffer and 9
ms of data in the ring buffer.

After 2 ms:

DAC <- ssssssssss|sdd|dddddddd   <- pulseaudio

Reported delay: 10 ms

After 3 ms:

DAC <- ssssssssss|ddd|ddddddd    <- pulseaudio

Reported delay: 10 ms

Let's say pulseaudio refills the ring buffer now.

DAC <- ssssssssss|ddd|dddddddddd <- pulseaudio

Reported delay: 13 ms

After 4 ms:

DAC <- sssssssssd|ddd|ddddddddd  <- pulseaudio

The first data chunk has now entered the invisible part of the buffer,
but it will still take 9 ms before it hits the DAC. At this point
pulseaudio has written 13 ms of audio, and the reported delay is 12 ms.
According to your logic, the adjusted delay is 12 + (4 - 1) = 15 ms,
while in reality the latency is 22 ms.

I don't know how well this model reflects the reality of how the usb
audio driver works, but this model seems like a plausible explanation
for why the driver reports delays equalling the amount of written data
in the beginning, and why the real latency is higher than the reported
latency at later times.

I hope this also clarifies why I don't buy your argument that the time
stamp difference is somehow related to the unreported latency.

> Trying to fix up that delay on every iteration does not make any sense
> at all, it is there from the start and it is constant.

Commenting on "it is constant": The playback latency is the sum of data
in various buffers. The DAC consumes one sample at a time from the very
last buffer, but I presume that all other places move data in bigger
chunks than one sample. The unreported delay can only be constant if
data moves to the invisible part of the buffering in one sample chunks.
Otherwise the latency goes down every time the DAC reads a sample, and
then when the buffer is refilled at the other end, the latency jumps up
by the refill amount.

> > > That was what my original question was about - what should I do with
> > > this extra latency? Currently I am just adding it as an offset to the
> > > "normal" latency. This however means, that if you configure let's say
> > > 10ms, you will get in fact around 22ms. (You would get 22ms anyway,
> > > but the reports would show 10ms with the old code.)
> > > For HDA the reported delay is even slightly negative, probably because
> > > the card already starts during the preparation step. Negative delays
> > > are truncated by my code, no real audio should have been played
> > > before snd_pcm_start().
> > A negative delay indicates an underrun according to
> > http://www.alsa-project.org/alsa-doc/alsa-lib/group___p_c_m___status.html#ga1fdce3985e64f66385a5805da1110f18
> This is not a negative delay reported by alsa, but my "extra latency"
> is getting negative, which means playback must have started
> before snd_pcm_start().
> According to Raymond Yau playback seems in fact to be started
> before snd_pcm_start() for HDA devices, at least if I read his last
> mail on that topic right. Then the negative delays would even make
> sense, since data is written to the buffer before snd_pcm_start().

I had a look at the code to verify the claim that we configure alsa to
start playback already before we call snd_pcm_start(). If we really do
that intentionally, then it doesn't make sense to call snd_pcm_start()
explicitly.

This is what we do:
snd_pcm_sw_params_set_start_threshold(pcm, swparams, (snd_pcm_uframes_t) -1)

Note the casting of -1 to an unsigned integer. It seems that the
intention is to set as high threshold as possible to avoid automatic
starting. However, alsa-lib casts the threshold back to a signed value
when it's used, and I believe the end result is indeed that playback
starts immediately after the first write. I don't know if that matters,
since we do the manual snd_pcm_start() call immediately after the first
write anyway, but it seems like a bug in any case.

> > > > > BTW, do you think the debug output of module-loopback is
> > > > > better now?
> > > > Yes, although if it's logged twice a second, it might be better to
> > > > print the status only if explicitly requested via a module argument.
> > > > 
> > > The status is printed once every adjust time and only when debug
> > > logging is enabled. 500ms seems to be a good value for the adjust
> > > time, it is currently my default. If you prefer an additional argument
> > > to enable logging, I can add it.
> > Yes, I'd prefer that. Continous logging annoys me when it's about
> > something that I'm not currently interested in. module-loopback isn't
> > the only thing generating annoying logs, but decreasing the interval
> > from 10 seconds to 0.5 seconds makes the problem that much worse.
> > 
> OK, could we do something similar for those "memblock pool full"
> messages? Sometimes I get several thousand messages per second
> (which are suppressed by the rate limiting) and I cannot see any
> relevance.

Have you investigated why there are so many memblocks active that the
pool gets full? It seems fishy to me. The limit could be triggered e.g.
by thousand 1 ms blocks in a 1 second buffer, but why would you do
that?

-- 
Tanu