[pulseaudio-discuss] [PATCH 02/21 v2] loopback: Initialize latency at startup and during source/sink changes

Tanu Kaskinen tanuk at iki.fi
Fri Feb 24 18:19:17 UTC 2017


On Thu, 2017-02-23 at 17:55 +0100, Georg Chini wrote:
> On 23.02.2017 12:17, Tanu Kaskinen wrote:
> > On Sun, 2017-02-19 at 17:15 +0100, Georg Chini wrote:
> > > The current code does not make any attempt to initialize the end-to-end latency
> > > to a value near the desired latency. This leads to underruns at startup because
> > > the memblockq is initially empty and to very long adjustment times for long
> > > latencies because the end-to-end latency at startup is significantly shorter
> > > than the desired value.
> > > This patch initializes the memblockq at startup and during source or sink changes
> > > so that the end-to-end latency will be near the configured value. It also ensures
> > > that there are no underruns if the source is slow to start and that the latency
> > > does not grow too much when the sink is slow to start by adjusting the length of
> > > the memblockq until the source has called push for the first time and the sink
> > > has called pop for the second time. Waiting for the second pop is necessary
> > > because the sink has not been started when the first pop is called.
> > > +    pa_usec_t effective_source_latency;
> > > +
> > > +    effective_source_latency = u->configured_source_latency;
> > > +
> > > +    if (source) {
> > > +        effective_source_latency = pa_source_get_requested_latency(source);
> > > +        if (effective_source_latency == 0 || effective_source_latency > u->configured_source_latency)
> > > +            effective_source_latency = u->configured_source_latency;
> > > +    }
> > 
> > It looks like this code should take into account also the minimum
> > latency of the source. Or is it ok to set effective_source_latency to a
> > lower value than the minimum latency of the source?
> 
> pa_source_get_requested_latency() can't return anything less than the
> minimum latency and u->configured_source latency is also greater or equal
> to the minimum source latency, so I see no way how this function could set
> effective_source_latency to something smaller.

My mistake, I thought that u->configured_source_latency could be less
than the source minimum latency.

> > > +
> > > +    /* If the sink is valid, send a message to the output thread, else set the variable directly */
> > > +    if (sink)
> > > +        pa_asyncmsgq_send(sink->asyncmsgq, PA_MSGOBJECT(u->sink_input), SINK_INPUT_MESSAGE_SET_EFFECTIVE_SOURCE_LATENCY, NULL, (int64_t)effective_source_latency, NULL);
> > > +    else
> > > +       u->output_thread_info.effective_source_latency = effective_source_latency;
> > > +}
> > > +
> > > +/* Called from main thread.
> > > + * Set source output latency to one third of the overall latency if possible.
> > > + * The choice of one third is rather arbitrary somewhere between the minimum
> > > + * possible latency (which would cause a lot of CPU load) and half the configured
> > > + * latency (which would lead to an empty memblockq if the sink is configured
> > > + * likewise). */
> > 
> > The comment is a bit inaccurate: the other end of the possible scale
> > isn't half the configured latency, and using half the configured
> > latency wouldn't necessarily lead to an empty memblockq (and the
> > memblockq can never be empty all the time).
> > 
> > Minimizing the memblockq length could be achieved with the following
> > algorithm:
> > 
> > 1) Configure both sink and source latency to half of u->latency.
> 
> This will fail and lead to underruns. I have been testing a lot and
> it definitely does not work, what ever might be the theory.

Ok. The comment is inaccurate regardless.

> > 
> > 2) If that fails, because the max latency limit of the sink or source
> > doesn't allow setting that high latency, set the latency to maximum
> > supported for the sink or source (depending on which of those reached
> > the max latency limit).
> > 
> > 3) If the sink max latency was the limiting factor in step 2, increase
> > the source latency until u->latency is reached, or until the source's
> > max latency limit is hit too. Conversely, if the source max latency was
> > the limiting factor in step 2, increase the sink latency.
> > 
> > If u->latency is fully allocated to the sink and source latencies,
> > leaving no slack to be allocated for the memblockq, that doesn't mean
> > that the memblockq will be empty, except possibly for very short
> > durations.
> > 
> > The sum of the sink latency, the memblockq length and the source
> > latency at any given moment is constant (assuming perfect clock drift
> > compensation). The configured sink and source latencies together never
> > exceed the target end-to-end latency. Therefore, the only situation
> > where the memblockq can get empty is when the sink latency is at its
> > peak (i.e. right after reading from the memblockq) and when the source
> > latency is at its peak (i.e. right before writing to the memblockq).
> > Since the reads and writes aren't synchronized, it's a relatively rare
> > event that we end up in a situation where the sink has just read data
> > and the source is just about to write data.
> 
> For alsa devices with timer based scheduling, the source will wait
> one full source latency before it pushes any data (at least this is
> what I conclude from the numbers I have seen during the first
> push). So at startup, you need at least enough data in the memblockq
> to survive this period.

If there's any "data" at startup, it's just silence. The memblockq can
as well be empty at startup.

> > 
> > I believe minimizing the memblockq length would be a sensible strategy,
> > since it would minimize the CPU usage. I'm fine with the "configure
> > sink and source with a third of u->latency" approach, however. I'd just
> > like the comment to describe the options accurately.
> 
> I believe the current approach is the best option we have.

Yes. If maximizing the memblockq length as I described leads to
underruns, I don't have a better scheme to offer. Your approach is
probably not "optimal", but designing an optimal scheme would require
understanding why those underruns happen.

> > > +static void set_source_output_latency(struct userdata *u, pa_source *source) {
> > > +    pa_usec_t latency, requested_latency;
> > > +
> > > +    requested_latency = u->latency / 3;
> > > +
> > > +    latency = PA_CLAMP(requested_latency , u->min_source_latency, u->max_source_latency);
> > > +    u->configured_source_latency = pa_source_output_set_requested_latency(u->source_output, latency);
> > > +    if (u->configured_source_latency != requested_latency)
> > > +        pa_log_warn("Cannot set requested source latency of %0.2f ms, adjusting to %0.2f ms", (double)requested_latency / PA_USEC_PER_MSEC, (double)u->configured_source_latency / PA_USEC_PER_MSEC);
> > > +}
> > > +
> > >   /* Called from input thread context */
> > >   static void source_output_attach_cb(pa_source_output *o) {
> > >       struct userdata *u;
> > > @@ -435,12 +563,26 @@ static void source_output_moving_cb(pa_source_output *o, pa_source *dest) {
> > >       if ((n = pa_proplist_gets(dest->proplist, PA_PROP_DEVICE_ICON_NAME)))
> > >           pa_sink_input_set_property(u->sink_input, PA_PROP_DEVICE_ICON_NAME, n);
> > >   
> > > -    if (pa_source_get_state(dest) == PA_SOURCE_SUSPENDED)
> > > -        pa_sink_input_cork(u->sink_input, true);
> > > -    else
> > > +    /* Set latency and calculate latency limits */
> > > +    update_latency_boundaries(u, dest, NULL);
> > > +    set_source_output_latency(u, dest);
> > > +    get_effective_source_latency(u, dest, u->sink_input->sink);
> > > +
> > > +    if (pa_source_get_state(dest) == PA_SOURCE_SUSPENDED) {
> > > +        if (dest->suspend_cause != PA_SUSPEND_IDLE)
> > > +            pa_sink_input_cork(u->sink_input, true);
> > > +    } else
> > >           pa_sink_input_cork(u->sink_input, false);
> > 
> > Is the intention to cork the sink input if the source is suspended for
> > a non-idle reason, and uncork it otherwise? This code doesn't do
> > anything if the source is suspended due to being idle, so if the sink
> > input was corked before, it will stay corked, which doesn't make much
> > sense to me, but maybe I'm missing something, because the old code
> > didn't uncork the sink input in this situation either.
> 
> If the source is suspended for idle reason, we know it will start
> when it is needed by the module. So there is no need to cork the
> sink input.

Ok, makes sense. The old code didn't have this check, however. Did you
observe a problem with that?

> If the sink input has been corked previously for some other reason,
> we should not uncork it here (for example module-role-cork might
> have done so).

module-role-cork doesn't actually cork the stream, it just sends a cork
request (which module-loopback ignores). The stream can only be corked
if module-loopback corked it by itself earlier. Corking is currently
only allowed for the stream owner, because we don't have the means to
track multiple simultaneous cork requests. If multiple things tried to
control the cork state, there would be conflicts and confusion.

If the loopback was connected to a suspended source before, and
therefore corked the sink input, shouldn't we uncork the sink input if
the loopback moves to a source that is idle-suspended?

> > 
> > 
> > > +        u->output_thread_info.pop_called = true;
> > > +    }
> > > +    u->output_thread_info.first_pop_done = true;
> > >   
> > >       if (pa_memblockq_peek(u->memblockq, chunk) < 0) {
> > >           pa_log_info("Could not peek into queue");
> > > @@ -478,6 +644,12 @@ static int sink_input_pop_cb(pa_sink_input *i, size_t nbytes, pa_memchunk *chunk
> > >       chunk->length = PA_MIN(chunk->length, nbytes);
> > >       pa_memblockq_drop(u->memblockq, chunk->length);
> > >   
> > > +    /* If push has not been called yet, assume that the source will deliver one full latency
> > > +     * when it starts pushing. Adjust the memblockq accordingly and ensure that there is
> > > +     * enough data in the queue to avoid underruns. */
> > > +    if (!u->output_thread_info.push_called)
> > > +        memblockq_adjust(u, u->output_thread_info.effective_source_latency, true);
> > 
> > This seems unnecessary. Why can't we just let the pop calls drain the
> > memblockq, and add the necessary amount of silence to the memblockq
> > only when we receive the first chunk from the source?
> 
> The whole point is to keep on delivering silence even in the case that
> the source takes a long time to start. So we avoid draining the
> memblockq completely, which would lead to underruns.

Why do you want to avoid underruns during the startup? Just to avoid
unnecessary noise in the log? (That would be a good enough reason for
me.)

If the memblockq becomes empty, silence can still be easily generated.
The silence doesn't have to come from the memblockq.

> > > +                time_delta += pa_sink_get_latency_within_thread(u->sink_input->sink);
> > > +
> > > +                /* If the source has overrun, assume that the maximum it should have pushed is
> > > +                 * one full source latency. It may still be possible that the next push also
> > > +                 * contains too much data, then the resulting latency will be wrong. */
> > > +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
> > > +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
> > > +                else
> > > +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
> > 
> > It's unclear to me what's happening here. I guess you're substracting
> > chunk->length from time_delta, because when push_cb was called, the
> > chunk was still included in the source latency report (I'm a bit
> > worried that some sources might not do this), but the chunk has also
> > been pushed to the memblockq, so if we don't adjust time_delta, the
> > chunk will be counted twice, and the memblockq adjustment will be
> > wrong.
> 
> Yes, exactly.
> > 
> > But I don't really get why you are concerned with "overruns", and how
> > your code mitigates any problems. If you substract only a part of
> > chunk->length, it means that a part of the chunk will be counted twice,
> > resulting in wrong latency.
> 
> No, it means, that the memblockq will be adjusted to the correct length
> because it is adjusted after the data has been pushed.

The correct memblockq length is u->latency - sink latency - source
latency. To me it seems that your adjustment makes us overestimate the
source latency.

The latency report that the source gives is (1) the current chunk
length, plus (2) any buffered data that isn't included in the current
chunk. In adjust_memblockq() we're only interested in (2), which is why
we subtract the chunk length from the reported source latency. If the
chunk was huge for some reason, adjust_memblockq() will in any case
drop excess data.

> If the source
> has overrun and delivered much more data than expected, the excess
> data will be dropped.

If the source latency is overestimated, too much data will be dropped.

While reading the code again, I noticed some more issues:

>          case SINK_INPUT_MESSAGE_POST:
>  
> -            pa_sink_input_assert_io_context(u->sink_input);
> +            pa_memblockq_push_align(u->memblockq, chunk);
> +
> +            /* If push has not been called yet, latency adjustments in sink_input_pop_cb()
> +             * are enabled. Disable them on first push and correct the memblockq. Do the
> +             * same if the pop_cb() requested the adjustment */
> +            if (!u->output_thread_info.push_called || u->output_thread_info.pop_adjust) {
> +                pa_usec_t time_delta;
>  
> -            if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state))
> -                pa_memblockq_push_align(u->memblockq, chunk);
> -            else
> -                pa_memblockq_flush_write(u->memblockq, true);
> +                time_delta = PA_PTR_TO_UINT(data);
> +                time_delta += pa_rtclock_now() - offset;
> +                time_delta += pa_sink_get_latency_within_thread(u->sink_input->sink);
> +
> +                /* If the source has overrun, assume that the maximum it should have pushed is
> +                 * one full source latency. It may still be possible that the next push also
> +                 * contains too much data, then the resulting latency will be wrong. */
> +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
> +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
> +                else
> +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
> +
> +                memblockq_adjust(u, time_delta, true);
> +
> +                u->output_thread_info.pop_adjust = false;
> +                u->output_thread_info.push_called = true;
> +            }

This adjustment needs to be postponed until pop has been called. If the
sink is not yet running, the sink latency is undefined.

Also, if the memblockq is too short, memblockq_adjust() will add
silence, but it's written after the valid data, so there will be a gap
in playback. The silence should be added by using
pa_memblockq_rewind(), but it has to be ensured that the memblockq
doesn't still contain some old non-silence data that gets replayed. I'm
not sure how to best do that.

> +
> +            /* If pop has not been called yet, make sure the latency does not grow too much.
> +             * Don't push any silence here, because we already have new data in the queue */
> +            if (!u->output_thread_info.pop_called)
> +                 memblockq_adjust(u, pa_sink_get_latency_within_thread(->sink_input->sink), false);

As mentioned above, the sink latency is undefined if the sink is not
running yet. Luckily there's no need need to know the sink latency
here: before pop has been called, the only reason why the memblockq
needs to be adjusted in the POST handler is to avoid it becoming huge
and thus consuming a lot of memory. memblockq_adjust() can be called
here with simply zero as the offset parameter.

-- 
Tanu

https://www.patreon.com/tanuk


More information about the pulseaudio-discuss mailing list