[pulseaudio-discuss] [PATCH 02/21 v2] loopback: Initialize latency at startup and during source/sink changes

Sat Feb 25 12:51:41 UTC 2017

On Sat, 2017-02-25 at 12:13 +0100, Georg Chini wrote:
> On 24.02.2017 19:19, Tanu Kaskinen wrote:
> > On Thu, 2017-02-23 at 17:55 +0100, Georg Chini wrote:
> > > On 23.02.2017 12:17, Tanu Kaskinen wrote:
> > > > On Sun, 2017-02-19 at 17:15 +0100, Georg Chini wrote:
> > > > > +static void set_source_output_latency(struct userdata *u, pa_source *source) {
> > > > > +    pa_usec_t latency, requested_latency;
> > > > > +
> > > > > +    requested_latency = u->latency / 3;
> > > > > +
> > > > > +    latency = PA_CLAMP(requested_latency , u->min_source_latency, u->max_source_latency);
> > > > > +    u->configured_source_latency = pa_source_output_set_requested_latency(u->source_output, latency);
> > > > > +    if (u->configured_source_latency != requested_latency)
> > > > > +        pa_log_warn("Cannot set requested source latency of %0.2f ms, adjusting to %0.2f ms", (double)requested_latency / PA_USEC_PER_MSEC, (double)u->configured_source_latency / PA_USEC_PER_MSEC);
> > > > > +}
> > > > > +
> > > > >    /* Called from input thread context */
> > > > >    static void source_output_attach_cb(pa_source_output *o) {
> > > > >        struct userdata *u;
> > > > > @@ -435,12 +563,26 @@ static void source_output_moving_cb(pa_source_output *o, pa_source *dest) {
> > > > >        if ((n = pa_proplist_gets(dest->proplist, PA_PROP_DEVICE_ICON_NAME)))
> > > > >            pa_sink_input_set_property(u->sink_input, PA_PROP_DEVICE_ICON_NAME, n);
> > > > >    
> > > > > -    if (pa_source_get_state(dest) == PA_SOURCE_SUSPENDED)
> > > > > -        pa_sink_input_cork(u->sink_input, true);
> > > > > -    else
> > > > > +    /* Set latency and calculate latency limits */
> > > > > +    update_latency_boundaries(u, dest, NULL);
> > > > > +    set_source_output_latency(u, dest);
> > > > > +    get_effective_source_latency(u, dest, u->sink_input->sink);
> > > > > +
> > > > > +    if (pa_source_get_state(dest) == PA_SOURCE_SUSPENDED) {
> > > > > +        if (dest->suspend_cause != PA_SUSPEND_IDLE)
> > > > > +            pa_sink_input_cork(u->sink_input, true);
> > > > > +    } else
> > > > >            pa_sink_input_cork(u->sink_input, false);
> > > > 
> > > > Is the intention to cork the sink input if the source is suspended for
> > > > a non-idle reason, and uncork it otherwise? This code doesn't do
> > > > anything if the source is suspended due to being idle, so if the sink
> > > > input was corked before, it will stay corked, which doesn't make much
> > > > sense to me, but maybe I'm missing something, because the old code
> > > > didn't uncork the sink input in this situation either.
> > > 
> > > If the source is suspended for idle reason, we know it will start
> > > when it is needed by the module. So there is no need to cork the
> > > sink input.
> > 
> > Ok, makes sense. The old code didn't have this check, however. Did you
> > observe a problem with that?
> 
> Yes, it makes the calculation of the latency during a switch very
> difficult because the stream gets interrupted unnecessarily. Otherwise
> I would not have changed it.

Ok. As I said earlier, the problem description should be in the commit
message.

> > 
> > > If the sink input has been corked previously for some other reason,
> > > we should not uncork it here (for example module-role-cork might
> > > have done so).
> > 
> > module-role-cork doesn't actually cork the stream, it just sends a cork
> > request (which module-loopback ignores). The stream can only be corked
> > if module-loopback corked it by itself earlier. Corking is currently
> > only allowed for the stream owner, because we don't have the means to
> > track multiple simultaneous cork requests. If multiple things tried to
> > control the cork state, there would be conflicts and confusion.
> > 
> > If the loopback was connected to a suspended source before, and
> > therefore corked the sink input, shouldn't we uncork the sink input if
> > the loopback moves to a source that is idle-suspended?
> 
> I can do that as well if you like.

Doing that seems like the right thing to do. You can also try what
happens when moving from a user-suspended source to an idle-suspended
source, and if it somehow happens to work, then add a comment that
explains why it's not necessary to uncork the sink input.

> > > > 
> > > > > +        u->output_thread_info.pop_called = true;
> > > > > +    }
> > > > > +    u->output_thread_info.first_pop_done = true;
> > > > >    
> > > > >        if (pa_memblockq_peek(u->memblockq, chunk) < 0) {
> > > > >            pa_log_info("Could not peek into queue");
> > > > > @@ -478,6 +644,12 @@ static int sink_input_pop_cb(pa_sink_input *i, size_t nbytes, pa_memchunk *chunk
> > > > >        chunk->length = PA_MIN(chunk->length, nbytes);
> > > > >        pa_memblockq_drop(u->memblockq, chunk->length);
> > > > >    
> > > > > +    /* If push has not been called yet, assume that the source will deliver one full latency
> > > > > +     * when it starts pushing. Adjust the memblockq accordingly and ensure that there is
> > > > > +     * enough data in the queue to avoid underruns. */
> > > > > +    if (!u->output_thread_info.push_called)
> > > > > +        memblockq_adjust(u, u->output_thread_info.effective_source_latency, true);
> > > > 
> > > > This seems unnecessary. Why can't we just let the pop calls drain the
> > > > memblockq, and add the necessary amount of silence to the memblockq
> > > > only when we receive the first chunk from the source?
> > > 
> > > The whole point is to keep on delivering silence even in the case that
> > > the source takes a long time to start. So we avoid draining the
> > > memblockq completely, which would lead to underruns.
> > 
> > Why do you want to avoid underruns during the startup? Just to avoid
> > unnecessary noise in the log? (That would be a good enough reason for
> > me.)
> > 
> > If the memblockq becomes empty, silence can still be easily generated.
> > The silence doesn't have to come from the memblockq.
> 
> The reason are the ugly underrun messages as already stated
> above. And why should I generate silence any other way? The
> memblockq is there and I just push some silence into it.

Because we might get rid of the effective_source_latency variable, and
the code would become a bit easier to understand when there's no need
to think about whether the memblockq has the right amount of silence.

> > 
> > > > > +                time_delta += pa_sink_get_latency_within_thread(u->sink_input->sink);
> > > > > +
> > > > > +                /* If the source has overrun, assume that the maximum it should have pushed is
> > > > > +                 * one full source latency. It may still be possible that the next push also
> > > > > +                 * contains too much data, then the resulting latency will be wrong. */
> > > > > +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
> > > > > +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
> > > > > +                else
> > > > > +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
> > > > 
> > > > It's unclear to me what's happening here. I guess you're substracting
> > > > chunk->length from time_delta, because when push_cb was called, the
> > > > chunk was still included in the source latency report (I'm a bit
> > > > worried that some sources might not do this), but the chunk has also
> > > > been pushed to the memblockq, so if we don't adjust time_delta, the
> > > > chunk will be counted twice, and the memblockq adjustment will be
> > > > wrong.
> > > 
> > > Yes, exactly.
> > > > But I don't really get why you are concerned with "overruns", and how
> > > > your code mitigates any problems. If you substract only a part of
> > > > chunk->length, it means that a part of the chunk will be counted twice,
> > > > resulting in wrong latency.
> > > 
> > > No, it means, that the memblockq will be adjusted to the correct length
> > > because it is adjusted after the data has been pushed.
> > 
> > The correct memblockq length is u->latency - sink latency - source
> > latency. To me it seems that your adjustment makes us overestimate the
> > source latency.
> > 
> > The latency report that the source gives is (1) the current chunk
> > length, plus (2) any buffered data that isn't included in the current
> > chunk. In adjust_memblockq() we're only interested in (2), which is why
> > we subtract the chunk length from the reported source latency. If the
> > chunk was huge for some reason, adjust_memblockq() will in any case
> > drop excess data.
> > 
> > > If the source
> > > has overrun and delivered much more data than expected, the excess
> > > data will be dropped.
> > 
> > If the source latency is overestimated, too much data will be dropped.
> 
> You are talking about two different things. The source latency is
> very different from the first chunk of data that is pushed.

Yes, they are very different things. But adjust_memblockq() needs to
know the real source latency, it doesn't care about the size of the
first chunk. It doesn't matter how big the first chunk is, because
adjust_memblockq() will in any case adjust the memblockq length so that
the end-to-end latency will match what is configured.

> When switching quickly back and forth between two sources, it
> happens, that a source configured to a few ms of latency
> suddenly pushes 500 ms of data on the first push. I think this
> is due to the fact that the source is re-configured to its maximum
> latency when it goes to idle and when switching back to low latency
> it somehow does not get rid of the excess data soon enough.
> Remember, the adjustment is only done on the first push, so it is
> a one-shot thing just to get the queue right. If I don't drop the
> data, I end up with those 500 ms too much in the memblockq.

The memblockq length should always be

    u->latency - real sink latency - real source latency

and if that is less than the 500 ms that was just pushed to the queue,
the extra data will be dropped. I don't see how we can end up with too
long memblockq if adjust_memblockq() gets accurate sink and source
latency information.

> > 
> > While reading the code again, I noticed some more issues:
> > 
> > >           case SINK_INPUT_MESSAGE_POST:
> > >   
> > > -            pa_sink_input_assert_io_context(u->sink_input);
> > > +            pa_memblockq_push_align(u->memblockq, chunk);
> > > +
> > > +            /* If push has not been called yet, latency adjustments in sink_input_pop_cb()
> > > +             * are enabled. Disable them on first push and correct the memblockq. Do the
> > > +             * same if the pop_cb() requested the adjustment */
> > > +            if (!u->output_thread_info.push_called || u->output_thread_info.pop_adjust) {
> > > +                pa_usec_t time_delta;
> > >   
> > > -            if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state))
> > > -                pa_memblockq_push_align(u->memblockq, chunk);
> > > -            else
> > > -                pa_memblockq_flush_write(u->memblockq, true);
> > > +                time_delta = PA_PTR_TO_UINT(data);
> > > +                time_delta += pa_rtclock_now() - offset;
> > > +                time_delta += pa_sink_get_latency_within_thread(u->sink_input->sink);
> > > +
> > > +                /* If the source has overrun, assume that the maximum it should have pushed is
> > > +                 * one full source latency. It may still be possible that the next push also
> > > +                 * contains too much data, then the resulting latency will be wrong. */
> > > +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
> > > +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
> > > +                else
> > > +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
> > > +
> > > +                memblockq_adjust(u, time_delta, true);
> > > +
> > > +                u->output_thread_info.pop_adjust = false;
> > > +                u->output_thread_info.push_called = true;
> > > +            }
> > 
> > This adjustment needs to be postponed until pop has been called. If the
> > sink is not yet running, the sink latency is undefined.
> 
> No, it does not. "Undefined" means, that the pa_sink_get_latency_in_thread()
> call just returns 0. Where is the problem with that?

0 doesn't correspond to the real world, that's the problem. It's
pointless to try to make calculations based on incorrect input.

> And in fact, this adjustment is done twice if sink and source are
> both starting, once for the source and once for the sink. Then the
> second adjustment does the right thing regardless if sink or source
> started first.

The second adjustment does the right thing regardless of whether the
first adjustment was done or not, so why not skip the first adjustment
altogether?

> > 
> > Also, if the memblockq is too short, memblockq_adjust() will add
> > silence, but it's written after the valid data, so there will be a gap
> > in playback. The silence should be added by using
> > pa_memblockq_rewind(), but it has to be ensured that the memblockq
> > doesn't still contain some old non-silence data that gets replayed. I'm
> > not sure how to best do that.
> 
> This is the only situation where I allow pushing silence after having
> valid data in the queue. Since this is a one (or two) time event, I think
> it is acceptable to have a small gap in the stream to fix up the latency.
> Because it is done in a switching situation, I believe nobody will ever
> notice the gap.

Ok, if it's a deliberate decision to have simpler code at the risk of
glitches, then a FIXME comment would be appropriate.

> > 
> > > +
> > > +            /* If pop has not been called yet, make sure the latency does not grow too much.
> > > +             * Don't push any silence here, because we already have new data in the queue */
> > > +            if (!u->output_thread_info.pop_called)
> > > +                 memblockq_adjust(u, pa_sink_get_latency_within_thread(->sink_input->sink), false);
> > 
> > As mentioned above, the sink latency is undefined if the sink is not
> > running yet. Luckily there's no need need to know the sink latency
> > here: before pop has been called, the only reason why the memblockq
> > needs to be adjusted in the POST handler is to avoid it becoming huge
> > and thus consuming a lot of memory. memblockq_adjust() can be called
> > here with simply zero as the offset parameter.
> > 
> 
> This will keep too much data in the queue if the sink is running.
> This would not be a big problem, but why not adjust as good as possible in this 
> situation?

Because there's no benefit in doing that. Using 0 is simpler and
doesn't involve invoking kernel code to ask for information that might
be bogus anyway.

-- 
Tanu

https://www.patreon.com/tanuk