[pulseaudio-discuss] [PATCH 02/21 v2] loopback: Initialize latency at startup and during source/sink changes

Tue Feb 28 06:54:38 UTC 2017

On 27.02.2017 11:38, Tanu Kaskinen wrote:
> On Mon, 2017-02-27 at 08:44 +0100, Georg Chini wrote:
>>>>>>>>>> +                /* If the source has overrun, assume that the maximum it should have pushed is
>>>>>>>>>> +                 * one full source latency. It may still be possible that the next push also
>>>>>>>>>> +                 * contains too much data, then the resulting latency will be wrong. */
>>>>>>>>>> +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
>>>>>>>>>> +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
>>>>>>>>>> +                else
>>>>>>>>>> +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
>>>>>>>>> It's unclear to me what's happening here. I guess you're substracting
>>>>>>>>> chunk->length from time_delta, because when push_cb was called, the
>>>>>>>>> chunk was still included in the source latency report (I'm a bit
>>>>>>>>> worried that some sources might not do this), but the chunk has also
>>>>>>>>> been pushed to the memblockq, so if we don't adjust time_delta, the
>>>>>>>>> chunk will be counted twice, and the memblockq adjustment will be
>>>>>>>>> wrong.
>>>>>>>> Yes, exactly.
>>>>>>>>> But I don't really get why you are concerned with "overruns", and how
>>>>>>>>> your code mitigates any problems. If you substract only a part of
>>>>>>>>> chunk->length, it means that a part of the chunk will be counted twice,
>>>>>>>>> resulting in wrong latency.
>>>>>>>> No, it means, that the memblockq will be adjusted to the correct length
>>>>>>>> because it is adjusted after the data has been pushed.
>>>>>>> The correct memblockq length is u->latency - sink latency - source
>>>>>>> latency. To me it seems that your adjustment makes us overestimate the
>>>>>>> source latency.
>>>>>>>
>>>>>>> The latency report that the source gives is (1) the current chunk
>>>>>>> length, plus (2) any buffered data that isn't included in the current
>>>>>>> chunk. In adjust_memblockq() we're only interested in (2), which is why
>>>>>>> we subtract the chunk length from the reported source latency. If the
>>>>>>> chunk was huge for some reason, adjust_memblockq() will in any case
>>>>>>> drop excess data.
>>>>>>>
>>>>>>>> If the source
>>>>>>>> has overrun and delivered much more data than expected, the excess
>>>>>>>> data will be dropped.
>>>>>>> If the source latency is overestimated, too much data will be dropped.
>>>>>> You are talking about two different things. The source latency is
>>>>>> very different from the first chunk of data that is pushed.
>>>>> Yes, they are very different things. But adjust_memblockq() needs to
>>>>> know the real source latency, it doesn't care about the size of the
>>>>> first chunk. It doesn't matter how big the first chunk is, because
>>>>> adjust_memblockq() will in any case adjust the memblockq length so that
>>>>> the end-to-end latency will match what is configured.
>>>>>
>>>>>> When switching quickly back and forth between two sources, it
>>>>>> happens, that a source configured to a few ms of latency
>>>>>> suddenly pushes 500 ms of data on the first push. I think this
>>>>>> is due to the fact that the source is re-configured to its maximum
>>>>>> latency when it goes to idle and when switching back to low latency
>>>>>> it somehow does not get rid of the excess data soon enough.
>>>>>> Remember, the adjustment is only done on the first push, so it is
>>>>>> a one-shot thing just to get the queue right. If I don't drop the
>>>>>> data, I end up with those 500 ms too much in the memblockq.
>>>>> The memblockq length should always be
>>>>>
>>>>>        u->latency - real sink latency - real source latency
>>>>>
>>>>> and if that is less than the 500 ms that was just pushed to the queue,
>>>>> the extra data will be dropped. I don't see how we can end up with too
>>>>> long memblockq if adjust_memblockq() gets accurate sink and source
>>>>> latency information.
>>>> Sorry, but take a look at the code. We CALCULATE how long the memblockq
>>>> should be and pass that as a parameter to adjust_memblockq() AND we have
>>>> to subtract the first chunk length to get things right.
>>> I don't think I've said anything that conflicts with what you say here.
>>>
>>>> So if the first chunk is too big, the memblockq will be too long, as
>>>> simple as that.
>>> I don't understand how you jump to this conclusion.
>>>
>>> I'll try to explain how I understand the system:
>>>
>>> Invariant 1: memblockq length should always be this when the pipeline
>>> is fully up and running:
>>>
>>>       u->latency - real sink latency - real source latency
>>>
>>> When doing the final adjustment, the offset that is passed to
>>> adjust_memblockq() should be
>>>
>>>       real sink latency + real source latency
>>>
>>> Invariant 2: the real source latency at the time of processing the POST
>>> message can always be calculated like this:
>>>
>>>       latency report - size of the received chunk + time between sending and receiving the POST message
>>>
>>> You are saying that the size of the received chunk must be adjusted so
>>> that it won't be bigger than effective_source_latency. Do you think
>>> invariant 2 is wrong (that is, the formula doesn't always give an
>>> accurate measure of the real source latency), or that the invariant 1
>>> is wrong (that is, sometimes it makes sense to have a shorter memblockq
>>> than given by the formula)?
>> Invariant 2 is wrong when the source overruns. The received chunk can be
>> much
>> bigger than the latency report and your formula above will produce
>> negative values.
>> It is even possible that the source is sending more than one "much too
>> big" chunk,
>> so the situation that you end up with far too much data in the queue cannot
>> completely be avoided. I did not put the code there for some theoretical
>> reason,
>> you can easily reproduce the problem by switching rapidly between two
>> sources
>> at low latency.
> Ah, now I finally see the problem. To make the code more clear about
> this, I would suggest adding a separate variable for the source
> latency, and comment the calculation like this, for example:
>
> /* data contains the reported source latency at the time the
>   * message was sent. *
> source_latency = PA_PTR_TO_UINT(data);
>
> /* The source latency report includes the audio in the chunk,
>   * but since we already pushed the chunk to the memblockq, we need
>   * to subtract the chunk size from the source latency so that it
>   * won't be counted towards both the memblockq latency and the
>   * source latency.
>   *
>   * Sometimes the alsa source reports too small latencies for some
>   * reason. We can't reliably detect when that happens, but one
>   * obvious indicator is when the reported latency is smaller than
>   * the size of the chunk that the source generated. In those case
>   * we estimate the source latency to be zero, which may not be
>   * entirely accurate, but it's in any case more accurate than
>   * the reported latency. */
> if (chunk->length <= source_latency)
>      source_latency -= chunk->length;
> else
>      source_latency = 0;
>
> /* offset is a timestamp of the moment when the POST message was
>   * sent. It might have taken some time before we got to processing
>   * the message, and during that time the source latency has grown. */
> source_latency += pa_rtclock_now() - offset;
>
> I thought that the alsa source latency reports would always be directly
> based on snd_pcm_delay(), but now that I checked, we seem to be using a
> smoother instead. Maybe there's some smoother adjustment missing when
> the source's configured latency changes. Or maybe the smoother just
> behaves badly.
>
One more point: The current source latency may have any value
between 0 and the requested source latency, so using the current
value for a check does not seem to be correct.You need the
requested source latency here, which may be very different from
the configured latency or the reported latency. That was the reason,
why I introduced the effective_source_latency variable.