[pulseaudio-discuss] [PATCH 02/21 v2] loopback: Initialize latency at startup and during source/sink changes

Mon Feb 27 07:44:51 UTC 2017

>>>>>>>> +                /* If the source has overrun, assume that the maximum it should have pushed is
>>>>>>>> +                 * one full source latency. It may still be possible that the next push also
>>>>>>>> +                 * contains too much data, then the resulting latency will be wrong. */
>>>>>>>> +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
>>>>>>>> +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
>>>>>>>> +                else
>>>>>>>> +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
>>>>>>> It's unclear to me what's happening here. I guess you're substracting
>>>>>>> chunk->length from time_delta, because when push_cb was called, the
>>>>>>> chunk was still included in the source latency report (I'm a bit
>>>>>>> worried that some sources might not do this), but the chunk has also
>>>>>>> been pushed to the memblockq, so if we don't adjust time_delta, the
>>>>>>> chunk will be counted twice, and the memblockq adjustment will be
>>>>>>> wrong.
>>>>>> Yes, exactly.
>>>>>>> But I don't really get why you are concerned with "overruns", and how
>>>>>>> your code mitigates any problems. If you substract only a part of
>>>>>>> chunk->length, it means that a part of the chunk will be counted twice,
>>>>>>> resulting in wrong latency.
>>>>>> No, it means, that the memblockq will be adjusted to the correct length
>>>>>> because it is adjusted after the data has been pushed.
>>>>> The correct memblockq length is u->latency - sink latency - source
>>>>> latency. To me it seems that your adjustment makes us overestimate the
>>>>> source latency.
>>>>>
>>>>> The latency report that the source gives is (1) the current chunk
>>>>> length, plus (2) any buffered data that isn't included in the current
>>>>> chunk. In adjust_memblockq() we're only interested in (2), which is why
>>>>> we subtract the chunk length from the reported source latency. If the
>>>>> chunk was huge for some reason, adjust_memblockq() will in any case
>>>>> drop excess data.
>>>>>
>>>>>> If the source
>>>>>> has overrun and delivered much more data than expected, the excess
>>>>>> data will be dropped.
>>>>> If the source latency is overestimated, too much data will be dropped.
>>>> You are talking about two different things. The source latency is
>>>> very different from the first chunk of data that is pushed.
>>> Yes, they are very different things. But adjust_memblockq() needs to
>>> know the real source latency, it doesn't care about the size of the
>>> first chunk. It doesn't matter how big the first chunk is, because
>>> adjust_memblockq() will in any case adjust the memblockq length so that
>>> the end-to-end latency will match what is configured.
>>>
>>>> When switching quickly back and forth between two sources, it
>>>> happens, that a source configured to a few ms of latency
>>>> suddenly pushes 500 ms of data on the first push. I think this
>>>> is due to the fact that the source is re-configured to its maximum
>>>> latency when it goes to idle and when switching back to low latency
>>>> it somehow does not get rid of the excess data soon enough.
>>>> Remember, the adjustment is only done on the first push, so it is
>>>> a one-shot thing just to get the queue right. If I don't drop the
>>>> data, I end up with those 500 ms too much in the memblockq.
>>> The memblockq length should always be
>>>
>>>       u->latency - real sink latency - real source latency
>>>
>>> and if that is less than the 500 ms that was just pushed to the queue,
>>> the extra data will be dropped. I don't see how we can end up with too
>>> long memblockq if adjust_memblockq() gets accurate sink and source
>>> latency information.
>> Sorry, but take a look at the code. We CALCULATE how long the memblockq
>> should be and pass that as a parameter to adjust_memblockq() AND we have
>> to subtract the first chunk length to get things right.
> I don't think I've said anything that conflicts with what you say here.
>
>> So if the first chunk is too big, the memblockq will be too long, as
>> simple as that.
> I don't understand how you jump to this conclusion.
>
> I'll try to explain how I understand the system:
>
> Invariant 1: memblockq length should always be this when the pipeline
> is fully up and running:
>
>      u->latency - real sink latency - real source latency
>
> When doing the final adjustment, the offset that is passed to
> adjust_memblockq() should be
>
>      real sink latency + real source latency
>
> Invariant 2: the real source latency at the time of processing the POST
> message can always be calculated like this:
>
>      latency report - size of the received chunk + time between sending and receiving the POST message
>
> You are saying that the size of the received chunk must be adjusted so
> that it won't be bigger than effective_source_latency. Do you think
> invariant 2 is wrong (that is, the formula doesn't always give an
> accurate measure of the real source latency), or that the invariant 1
> is wrong (that is, sometimes it makes sense to have a shorter memblockq
> than given by the formula)?

Invariant 2 is wrong when the source overruns. The received chunk can be 
much
bigger than the latency report and your formula above will produce 
negative values.
It is even possible that the source is sending more than one "much too 
big" chunk,
so the situation that you end up with far too much data in the queue cannot
completely be avoided. I did not put the code there for some theoretical 
reason,
you can easily reproduce the problem by switching rapidly between two 
sources
at low latency.

>
>>>>> While reading the code again, I noticed some more issues:
>>>>>
>>>>>>             case SINK_INPUT_MESSAGE_POST:
>>>>>>     
>>>>>> -            pa_sink_input_assert_io_context(u->sink_input);
>>>>>> +            pa_memblockq_push_align(u->memblockq, chunk);
>>>>>> +
>>>>>> +            /* If push has not been called yet, latency adjustments in sink_input_pop_cb()
>>>>>> +             * are enabled. Disable them on first push and correct the memblockq. Do the
>>>>>> +             * same if the pop_cb() requested the adjustment */
>>>>>> +            if (!u->output_thread_info.push_called || u->output_thread_info.pop_adjust) {
>>>>>> +                pa_usec_t time_delta;
>>>>>>     
>>>>>> -            if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state))
>>>>>> -                pa_memblockq_push_align(u->memblockq, chunk);
>>>>>> -            else
>>>>>> -                pa_memblockq_flush_write(u->memblockq, true);
>>>>>> +                time_delta = PA_PTR_TO_UINT(data);
>>>>>> +                time_delta += pa_rtclock_now() - offset;
>>>>>> +                time_delta += pa_sink_get_latency_within_thread(u->sink_input->sink);
>>>>>> +
>>>>>> +                /* If the source has overrun, assume that the maximum it should have pushed is
>>>>>> +                 * one full source latency. It may still be possible that the next push also
>>>>>> +                 * contains too much data, then the resulting latency will be wrong. */
>>>>>> +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
>>>>>> +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
>>>>>> +                else
>>>>>> +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
>>>>>> +
>>>>>> +                memblockq_adjust(u, time_delta, true);
>>>>>> +
>>>>>> +                u->output_thread_info.pop_adjust = false;
>>>>>> +                u->output_thread_info.push_called = true;
>>>>>> +            }
>>>>> This adjustment needs to be postponed until pop has been called. If the
>>>>> sink is not yet running, the sink latency is undefined.
>>>> No, it does not. "Undefined" means, that the pa_sink_get_latency_in_thread()
>>>> call just returns 0. Where is the problem with that?
>>> 0 doesn't correspond to the real world, that's the problem. It's
>>> pointless to try to make calculations based on incorrect input.
>> You are yourself saying I should just use 0 at another place, which does not
>> correspond to real world either.
> There are two kinds of situations where the memblockq is adjusted:
> situations where it's important that the memblockq is adjusted exactly
> to the target latency, using accurate sink and source latency
> information, and situations where the only goal is to avoid the
> memblockq growing so large that the memory usage becomes a concern (in
> this latter situation there's no requirement to adjust the memblockq to
> any precise value).
>
> I assumed that you believed that this piece code was supposed to be
> executed only in those situations where high precision was required.
> That seems to have been an incorrect assumption.

Well, I will use 0 now in the second situation, although I do not
really like it. It leads to much too long latencies until the final
adjust is done, but since it is an intermediate state, it does not
really matter. I still think that the goal should be to keep the
queue as near to the desired latency as possible in all situations.