[pulseaudio-discuss] [PATCH 02/21 v2] loopback: Initialize latency at startup and during source/sink changes

Sat Feb 25 11:13:07 UTC 2017

On 24.02.2017 19:19, Tanu Kaskinen wrote:
> On Thu, 2017-02-23 at 17:55 +0100, Georg Chini wrote:
>> On 23.02.2017 12:17, Tanu Kaskinen wrote:
>>> On Sun, 2017-02-19 at 17:15 +0100, Georg Chini wrote:
>>>
>>> For alsa devices with timer based scheduling, the source will wait
>>> one full source latency before it pushes any data (at least this is
>>> what I conclude from the numbers I have seen during the first
>>> push). So at startup, you need at least enough data in the memblockq
>>> to survive this period.
> If there's any "data" at startup, it's just silence. The memblockq can
> as well be empty at startup.

Yes, it could. But that produces tons of underrun messages at startup
that are more than ugly. I wanted to avoid those.

>
>>> I believe minimizing the memblockq length would be a sensible strategy,
>>> since it would minimize the CPU usage. I'm fine with the "configure
>>> sink and source with a third of u->latency" approach, however. I'd just
>>> like the comment to describe the options accurately.
>> I believe the current approach is the best option we have.
> Yes. If maximizing the memblockq length as I described leads to
> underruns, I don't have a better scheme to offer. Your approach is
> probably not "optimal", but designing an optimal scheme would require
> understanding why those underruns happen.
>
>>>> +static void set_source_output_latency(struct userdata *u, pa_source *source) {
>>>> +    pa_usec_t latency, requested_latency;
>>>> +
>>>> +    requested_latency = u->latency / 3;
>>>> +
>>>> +    latency = PA_CLAMP(requested_latency , u->min_source_latency, u->max_source_latency);
>>>> +    u->configured_source_latency = pa_source_output_set_requested_latency(u->source_output, latency);
>>>> +    if (u->configured_source_latency != requested_latency)
>>>> +        pa_log_warn("Cannot set requested source latency of %0.2f ms, adjusting to %0.2f ms", (double)requested_latency / PA_USEC_PER_MSEC, (double)u->configured_source_latency / PA_USEC_PER_MSEC);
>>>> +}
>>>> +
>>>>    /* Called from input thread context */
>>>>    static void source_output_attach_cb(pa_source_output *o) {
>>>>        struct userdata *u;
>>>> @@ -435,12 +563,26 @@ static void source_output_moving_cb(pa_source_output *o, pa_source *dest) {
>>>>        if ((n = pa_proplist_gets(dest->proplist, PA_PROP_DEVICE_ICON_NAME)))
>>>>            pa_sink_input_set_property(u->sink_input, PA_PROP_DEVICE_ICON_NAME, n);
>>>>    
>>>> -    if (pa_source_get_state(dest) == PA_SOURCE_SUSPENDED)
>>>> -        pa_sink_input_cork(u->sink_input, true);
>>>> -    else
>>>> +    /* Set latency and calculate latency limits */
>>>> +    update_latency_boundaries(u, dest, NULL);
>>>> +    set_source_output_latency(u, dest);
>>>> +    get_effective_source_latency(u, dest, u->sink_input->sink);
>>>> +
>>>> +    if (pa_source_get_state(dest) == PA_SOURCE_SUSPENDED) {
>>>> +        if (dest->suspend_cause != PA_SUSPEND_IDLE)
>>>> +            pa_sink_input_cork(u->sink_input, true);
>>>> +    } else
>>>>            pa_sink_input_cork(u->sink_input, false);
>>> Is the intention to cork the sink input if the source is suspended for
>>> a non-idle reason, and uncork it otherwise? This code doesn't do
>>> anything if the source is suspended due to being idle, so if the sink
>>> input was corked before, it will stay corked, which doesn't make much
>>> sense to me, but maybe I'm missing something, because the old code
>>> didn't uncork the sink input in this situation either.
>> If the source is suspended for idle reason, we know it will start
>> when it is needed by the module. So there is no need to cork the
>> sink input.
> Ok, makes sense. The old code didn't have this check, however. Did you
> observe a problem with that?

Yes, it makes the calculation of the latency during a switch very
difficult because the stream gets interrupted unnecessarily. Otherwise
I would not have changed it.

>
>> If the sink input has been corked previously for some other reason,
>> we should not uncork it here (for example module-role-cork might
>> have done so).
> module-role-cork doesn't actually cork the stream, it just sends a cork
> request (which module-loopback ignores). The stream can only be corked
> if module-loopback corked it by itself earlier. Corking is currently
> only allowed for the stream owner, because we don't have the means to
> track multiple simultaneous cork requests. If multiple things tried to
> control the cork state, there would be conflicts and confusion.
>
> If the loopback was connected to a suspended source before, and
> therefore corked the sink input, shouldn't we uncork the sink input if
> the loopback moves to a source that is idle-suspended?

I can do that as well if you like.

>
>>>
>>>> +        u->output_thread_info.pop_called = true;
>>>> +    }
>>>> +    u->output_thread_info.first_pop_done = true;
>>>>    
>>>>        if (pa_memblockq_peek(u->memblockq, chunk) < 0) {
>>>>            pa_log_info("Could not peek into queue");
>>>> @@ -478,6 +644,12 @@ static int sink_input_pop_cb(pa_sink_input *i, size_t nbytes, pa_memchunk *chunk
>>>>        chunk->length = PA_MIN(chunk->length, nbytes);
>>>>        pa_memblockq_drop(u->memblockq, chunk->length);
>>>>    
>>>> +    /* If push has not been called yet, assume that the source will deliver one full latency
>>>> +     * when it starts pushing. Adjust the memblockq accordingly and ensure that there is
>>>> +     * enough data in the queue to avoid underruns. */
>>>> +    if (!u->output_thread_info.push_called)
>>>> +        memblockq_adjust(u, u->output_thread_info.effective_source_latency, true);
>>> This seems unnecessary. Why can't we just let the pop calls drain the
>>> memblockq, and add the necessary amount of silence to the memblockq
>>> only when we receive the first chunk from the source?
>> The whole point is to keep on delivering silence even in the case that
>> the source takes a long time to start. So we avoid draining the
>> memblockq completely, which would lead to underruns.
> Why do you want to avoid underruns during the startup? Just to avoid
> unnecessary noise in the log? (That would be a good enough reason for
> me.)
>
> If the memblockq becomes empty, silence can still be easily generated.
> The silence doesn't have to come from the memblockq.

The reason are the ugly underrun messages as already stated
above. And why should I generate silence any other way? The
memblockq is there and I just push some silence into it.

>
>>>> +                time_delta += pa_sink_get_latency_within_thread(u->sink_input->sink);
>>>> +
>>>> +                /* If the source has overrun, assume that the maximum it should have pushed is
>>>> +                 * one full source latency. It may still be possible that the next push also
>>>> +                 * contains too much data, then the resulting latency will be wrong. */
>>>> +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
>>>> +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
>>>> +                else
>>>> +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
>>> It's unclear to me what's happening here. I guess you're substracting
>>> chunk->length from time_delta, because when push_cb was called, the
>>> chunk was still included in the source latency report (I'm a bit
>>> worried that some sources might not do this), but the chunk has also
>>> been pushed to the memblockq, so if we don't adjust time_delta, the
>>> chunk will be counted twice, and the memblockq adjustment will be
>>> wrong.
>> Yes, exactly.
>>> But I don't really get why you are concerned with "overruns", and how
>>> your code mitigates any problems. If you substract only a part of
>>> chunk->length, it means that a part of the chunk will be counted twice,
>>> resulting in wrong latency.
>> No, it means, that the memblockq will be adjusted to the correct length
>> because it is adjusted after the data has been pushed.
> The correct memblockq length is u->latency - sink latency - source
> latency. To me it seems that your adjustment makes us overestimate the
> source latency.
>
> The latency report that the source gives is (1) the current chunk
> length, plus (2) any buffered data that isn't included in the current
> chunk. In adjust_memblockq() we're only interested in (2), which is why
> we subtract the chunk length from the reported source latency. If the
> chunk was huge for some reason, adjust_memblockq() will in any case
> drop excess data.
>
>> If the source
>> has overrun and delivered much more data than expected, the excess
>> data will be dropped.
> If the source latency is overestimated, too much data will be dropped.

You are talking about two different things. The source latency is
very different from the first chunk of data that is pushed.
When switching quickly back and forth between two sources, it
happens, that a source configured to a few ms of latency
suddenly pushes 500 ms of data on the first push. I think this
is due to the fact that the source is re-configured to its maximum
latency when it goes to idle and when switching back to low latency
it somehow does not get rid of the excess data soon enough.
Remember, the adjustment is only done on the first push, so it is
a one-shot thing just to get the queue right. If I don't drop the
data, I end up with those 500 ms too much in the memblockq.

>
> While reading the code again, I noticed some more issues:
>
>>           case SINK_INPUT_MESSAGE_POST:
>>   
>> -            pa_sink_input_assert_io_context(u->sink_input);
>> +            pa_memblockq_push_align(u->memblockq, chunk);
>> +
>> +            /* If push has not been called yet, latency adjustments in sink_input_pop_cb()
>> +             * are enabled. Disable them on first push and correct the memblockq. Do the
>> +             * same if the pop_cb() requested the adjustment */
>> +            if (!u->output_thread_info.push_called || u->output_thread_info.pop_adjust) {
>> +                pa_usec_t time_delta;
>>   
>> -            if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state))
>> -                pa_memblockq_push_align(u->memblockq, chunk);
>> -            else
>> -                pa_memblockq_flush_write(u->memblockq, true);
>> +                time_delta = PA_PTR_TO_UINT(data);
>> +                time_delta += pa_rtclock_now() - offset;
>> +                time_delta += pa_sink_get_latency_within_thread(u->sink_input->sink);
>> +
>> +                /* If the source has overrun, assume that the maximum it should have pushed is
>> +                 * one full source latency. It may still be possible that the next push also
>> +                 * contains too much data, then the resulting latency will be wrong. */
>> +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
>> +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
>> +                else
>> +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
>> +
>> +                memblockq_adjust(u, time_delta, true);
>> +
>> +                u->output_thread_info.pop_adjust = false;
>> +                u->output_thread_info.push_called = true;
>> +            }
> This adjustment needs to be postponed until pop has been called. If the
> sink is not yet running, the sink latency is undefined.

No, it does not. "Undefined" means, that the pa_sink_get_latency_in_thread()
call just returns 0. Where is the problem with that? And in fact, this 
adjustment
is done twice if sink and source are both starting, once for the source 
and once
for the sink. Then the second adjustment does the right thing regardless 
if sink
or source started first.

>
> Also, if the memblockq is too short, memblockq_adjust() will add
> silence, but it's written after the valid data, so there will be a gap
> in playback. The silence should be added by using
> pa_memblockq_rewind(), but it has to be ensured that the memblockq
> doesn't still contain some old non-silence data that gets replayed. I'm
> not sure how to best do that.

This is the only situation where I allow pushing silence after having
valid data in the queue. Since this is a one (or two) time event, I think
it is acceptable to have a small gap in the stream to fix up the latency.
Because it is done in a switching situation, I believe nobody will ever
notice the gap.

>
>> +
>> +            /* If pop has not been called yet, make sure the latency does not grow too much.
>> +             * Don't push any silence here, because we already have new data in the queue */
>> +            if (!u->output_thread_info.pop_called)
>> +                 memblockq_adjust(u, pa_sink_get_latency_within_thread(->sink_input->sink), false);
> As mentioned above, the sink latency is undefined if the sink is not
> running yet. Luckily there's no need need to know the sink latency
> here: before pop has been called, the only reason why the memblockq
> needs to be adjusted in the POST handler is to avoid it becoming huge
> and thus consuming a lot of memory. memblockq_adjust() can be called
> here with simply zero as the offset parameter.
>
This will keep too much data in the queue if the sink is running. This would
not be a big problem, but why not adjust as good as possible in this 
situation?
Also I see no issue with undefined latency. As stated above, the call 
will return 0,
which is what you want to use anyway ...