[pulseaudio-discuss] [PATCH 02/21 v2] loopback: Initialize latency at startup and during source/sink changes

Thu Feb 23 16:55:18 UTC 2017

On 23.02.2017 12:17, Tanu Kaskinen wrote:
> On Sun, 2017-02-19 at 17:15 +0100, Georg Chini wrote:
>> The current code does not make any attempt to initialize the end-to-end latency
>> to a value near the desired latency. This leads to underruns at startup because
>> the memblockq is initially empty and to very long adjustment times for long
>> latencies because the end-to-end latency at startup is significantly shorter
>> than the desired value.
>> This patch initializes the memblockq at startup and during source or sink changes
>> so that the end-to-end latency will be near the configured value. It also ensures
>> that there are no underruns if the source is slow to start and that the latency
>> does not grow too much when the sink is slow to start by adjusting the length of
>> the memblockq until the source has called push for the first time and the sink
>> has called pop for the second time. Waiting for the second pop is necessary
>> because the sink has not been started when the first pop is called.
>
>> +    pa_usec_t effective_source_latency;
>> +
>> +    effective_source_latency = u->configured_source_latency;
>> +
>> +    if (source) {
>> +        effective_source_latency = pa_source_get_requested_latency(source);
>> +        if (effective_source_latency == 0 || effective_source_latency > u->configured_source_latency)
>> +            effective_source_latency = u->configured_source_latency;
>> +    }
> It looks like this code should take into account also the minimum
> latency of the source. Or is it ok to set effective_source_latency to a
> lower value than the minimum latency of the source?
pa_source_get_requested_latency() can't return anything less than the
minimum latency and u->configured_source latency is also greater or equal
to the minimum source latency, so I see no way how this function could set
effective_source_latency to something smaller.
>
>> +
>> +    /* If the sink is valid, send a message to the output thread, else set the variable directly */
>> +    if (sink)
>> +        pa_asyncmsgq_send(sink->asyncmsgq, PA_MSGOBJECT(u->sink_input), SINK_INPUT_MESSAGE_SET_EFFECTIVE_SOURCE_LATENCY, NULL, (int64_t)effective_source_latency, NULL);
>> +    else
>> +       u->output_thread_info.effective_source_latency = effective_source_latency;
>> +}
>> +
>> +/* Called from main thread.
>> + * Set source output latency to one third of the overall latency if possible.
>> + * The choice of one third is rather arbitrary somewhere between the minimum
>> + * possible latency (which would cause a lot of CPU load) and half the configured
>> + * latency (which would lead to an empty memblockq if the sink is configured
>> + * likewise). */
> The comment is a bit inaccurate: the other end of the possible scale
> isn't half the configured latency, and using half the configured
> latency wouldn't necessarily lead to an empty memblockq (and the
> memblockq can never be empty all the time).
>
> Minimizing the memblockq length could be achieved with the following
> algorithm:
>
> 1) Configure both sink and source latency to half of u->latency.
This will fail and lead to underruns. I have been testing a lot and
it definitely does not work, what ever might be the theory.
>
> 2) If that fails, because the max latency limit of the sink or source
> doesn't allow setting that high latency, set the latency to maximum
> supported for the sink or source (depending on which of those reached
> the max latency limit).
>
> 3) If the sink max latency was the limiting factor in step 2, increase
> the source latency until u->latency is reached, or until the source's
> max latency limit is hit too. Conversely, if the source max latency was
> the limiting factor in step 2, increase the sink latency.
>
> If u->latency is fully allocated to the sink and source latencies,
> leaving no slack to be allocated for the memblockq, that doesn't mean
> that the memblockq will be empty, except possibly for very short
> durations.
>
> The sum of the sink latency, the memblockq length and the source
> latency at any given moment is constant (assuming perfect clock drift
> compensation). The configured sink and source latencies together never
> exceed the target end-to-end latency. Therefore, the only situation
> where the memblockq can get empty is when the sink latency is at its
> peak (i.e. right after reading from the memblockq) and when the source
> latency is at its peak (i.e. right before writing to the memblockq).
> Since the reads and writes aren't synchronized, it's a relatively rare
> event that we end up in a situation where the sink has just read data
> and the source is just about to write data.
For alsa devices with timer based scheduling, the source will wait
one full source latency before it pushes any data (at least this is
what I conclude from the numbers I have seen during the first
push). So at startup, you need at least enough data in the memblockq
to survive this period.

>
> I believe minimizing the memblockq length would be a sensible strategy,
> since it would minimize the CPU usage. I'm fine with the "configure
> sink and source with a third of u->latency" approach, however. I'd just
> like the comment to describe the options accurately.
I believe the current approach is the best option we have.
>
>> +static void set_source_output_latency(struct userdata *u, pa_source *source) {
>> +    pa_usec_t latency, requested_latency;
>> +
>> +    requested_latency = u->latency / 3;
>> +
>> +    latency = PA_CLAMP(requested_latency , u->min_source_latency, u->max_source_latency);
>> +    u->configured_source_latency = pa_source_output_set_requested_latency(u->source_output, latency);
>> +    if (u->configured_source_latency != requested_latency)
>> +        pa_log_warn("Cannot set requested source latency of %0.2f ms, adjusting to %0.2f ms", (double)requested_latency / PA_USEC_PER_MSEC, (double)u->configured_source_latency / PA_USEC_PER_MSEC);
>> +}
>> +
>>   /* Called from input thread context */
>>   static void source_output_attach_cb(pa_source_output *o) {
>>       struct userdata *u;
>> @@ -435,12 +563,26 @@ static void source_output_moving_cb(pa_source_output *o, pa_source *dest) {
>>       if ((n = pa_proplist_gets(dest->proplist, PA_PROP_DEVICE_ICON_NAME)))
>>           pa_sink_input_set_property(u->sink_input, PA_PROP_DEVICE_ICON_NAME, n);
>>   
>> -    if (pa_source_get_state(dest) == PA_SOURCE_SUSPENDED)
>> -        pa_sink_input_cork(u->sink_input, true);
>> -    else
>> +    /* Set latency and calculate latency limits */
>> +    update_latency_boundaries(u, dest, NULL);
>> +    set_source_output_latency(u, dest);
>> +    get_effective_source_latency(u, dest, u->sink_input->sink);
>> +
>> +    if (pa_source_get_state(dest) == PA_SOURCE_SUSPENDED) {
>> +        if (dest->suspend_cause != PA_SUSPEND_IDLE)
>> +            pa_sink_input_cork(u->sink_input, true);
>> +    } else
>>           pa_sink_input_cork(u->sink_input, false);
> Is the intention to cork the sink input if the source is suspended for
> a non-idle reason, and uncork it otherwise? This code doesn't do
> anything if the source is suspended due to being idle, so if the sink
> input was corked before, it will stay corked, which doesn't make much
> sense to me, but maybe I'm missing something, because the old code
> didn't uncork the sink input in this situation either.
If the source is suspended for idle reason, we know it will start
when it is needed by the module. So there is no need to cork the
sink input.
If the sink input has been corked previously for some other reason,
we should not uncork it here (for example module-role-cork might
have done so).

>
>
>> +        u->output_thread_info.pop_called = true;
>> +    }
>> +    u->output_thread_info.first_pop_done = true;
>>   
>>       if (pa_memblockq_peek(u->memblockq, chunk) < 0) {
>>           pa_log_info("Could not peek into queue");
>> @@ -478,6 +644,12 @@ static int sink_input_pop_cb(pa_sink_input *i, size_t nbytes, pa_memchunk *chunk
>>       chunk->length = PA_MIN(chunk->length, nbytes);
>>       pa_memblockq_drop(u->memblockq, chunk->length);
>>   
>> +    /* If push has not been called yet, assume that the source will deliver one full latency
>> +     * when it starts pushing. Adjust the memblockq accordingly and ensure that there is
>> +     * enough data in the queue to avoid underruns. */
>> +    if (!u->output_thread_info.push_called)
>> +        memblockq_adjust(u, u->output_thread_info.effective_source_latency, true);
> This seems unnecessary. Why can't we just let the pop calls drain the
> memblockq, and add the necessary amount of silence to the memblockq
> only when we receive the first chunk from the source?
The whole point is to keep on delivering silence even in the case that
the source takes a long time to start. So we avoid draining the
memblockq completely, which would lead to underruns.
>
>> +                time_delta += pa_sink_get_latency_within_thread(u->sink_input->sink);
>> +
>> +                /* If the source has overrun, assume that the maximum it should have pushed is
>> +                 * one full source latency. It may still be possible that the next push also
>> +                 * contains too much data, then the resulting latency will be wrong. */
>> +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
>> +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
>> +                else
>> +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
> It's unclear to me what's happening here. I guess you're substracting
> chunk->length from time_delta, because when push_cb was called, the
> chunk was still included in the source latency report (I'm a bit
> worried that some sources might not do this), but the chunk has also
> been pushed to the memblockq, so if we don't adjust time_delta, the
> chunk will be counted twice, and the memblockq adjustment will be
> wrong.
Yes, exactly.
>
> But I don't really get why you are concerned with "overruns", and how
> your code mitigates any problems. If you substract only a part of
> chunk->length, it means that a part of the chunk will be counted twice,
> resulting in wrong latency.
No, it means, that the memblockq will be adjusted to the correct length
because it is adjusted after the data has been pushed. If the source
has overrun and delivered much more data than expected, the excess
data will be dropped.
>
> If we can avoid using effective_source_latency here, the variable can
> be removed (it's also used in pop_cb, but that code can be removed
> too).
See above, it can't be removed because it prevents underruns
when the source is starting slow.
>
>>   
>>       pa_sink_input_get_silence(u->sink_input, &silence);
>>       u->memblockq = pa_memblockq_new(
>> @@ -941,6 +1192,8 @@ int pa__init(pa_module *m) {
>>               0,                      /* maxrewind */
>>               &silence);              /* silence frame */
>>       pa_memblock_unref(silence.memblock);
>> +    /* Fill the memblockq with silence */
>> +    pa_memblockq_seek(u->memblockq, pa_usec_to_bytes(u->latency, &u->sink_input->sample_spec), PA_SEEK_RELATIVE, true);
> This seems unnecessary.
>
No, it isn't. You have to give the sink at least some data to start with.
Since the exact amount is not important, I just fill up the queue.