[pulseaudio-discuss] [PATCH 02/21 v2] loopback: Initialize latency at startup and during source/sink changes

Mon Feb 27 10:53:04 UTC 2017

On 27.02.2017 11:38, Tanu Kaskinen wrote:
> On Mon, 2017-02-27 at 08:44 +0100, Georg Chini wrote:
>>>>>>>>>> +                /* If the source has overrun, assume that the maximum it should have pushed is
>>>>>>>>>> +                 * one full source latency. It may still be possible that the next push also
>>>>>>>>>> +                 * contains too much data, then the resulting latency will be wrong. */
>>>>>>>>>> +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
>>>>>>>>>> +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
>>>>>>>>>> +                else
>>>>>>>>>> +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
>>>>>>>>> It's unclear to me what's happening here. I guess you're substracting
>>>>>>>>> chunk->length from time_delta, because when push_cb was called, the
>>>>>>>>> chunk was still included in the source latency report (I'm a bit
>>>>>>>>> worried that some sources might not do this), but the chunk has also
>>>>>>>>> been pushed to the memblockq, so if we don't adjust time_delta, the
>>>>>>>>> chunk will be counted twice, and the memblockq adjustment will be
>>>>>>>>> wrong.
>>>>>>>> Yes, exactly.
>>>>>>>>> But I don't really get why you are concerned with "overruns", and how
>>>>>>>>> your code mitigates any problems. If you substract only a part of
>>>>>>>>> chunk->length, it means that a part of the chunk will be counted twice,
>>>>>>>>> resulting in wrong latency.
>>>>>>>> No, it means, that the memblockq will be adjusted to the correct length
>>>>>>>> because it is adjusted after the data has been pushed.
>>>>>>> The correct memblockq length is u->latency - sink latency - source
>>>>>>> latency. To me it seems that your adjustment makes us overestimate the
>>>>>>> source latency.
>>>>>>>
>>>>>>> The latency report that the source gives is (1) the current chunk
>>>>>>> length, plus (2) any buffered data that isn't included in the current
>>>>>>> chunk. In adjust_memblockq() we're only interested in (2), which is why
>>>>>>> we subtract the chunk length from the reported source latency. If the
>>>>>>> chunk was huge for some reason, adjust_memblockq() will in any case
>>>>>>> drop excess data.
>>>>>>>
>>>>>>>> If the source
>>>>>>>> has overrun and delivered much more data than expected, the excess
>>>>>>>> data will be dropped.
>>>>>>> If the source latency is overestimated, too much data will be dropped.
>>>>>> You are talking about two different things. The source latency is
>>>>>> very different from the first chunk of data that is pushed.
>>>>> Yes, they are very different things. But adjust_memblockq() needs to
>>>>> know the real source latency, it doesn't care about the size of the
>>>>> first chunk. It doesn't matter how big the first chunk is, because
>>>>> adjust_memblockq() will in any case adjust the memblockq length so that
>>>>> the end-to-end latency will match what is configured.
>>>>>
>>>>>> When switching quickly back and forth between two sources, it
>>>>>> happens, that a source configured to a few ms of latency
>>>>>> suddenly pushes 500 ms of data on the first push. I think this
>>>>>> is due to the fact that the source is re-configured to its maximum
>>>>>> latency when it goes to idle and when switching back to low latency
>>>>>> it somehow does not get rid of the excess data soon enough.
>>>>>> Remember, the adjustment is only done on the first push, so it is
>>>>>> a one-shot thing just to get the queue right. If I don't drop the
>>>>>> data, I end up with those 500 ms too much in the memblockq.
>>>>> The memblockq length should always be
>>>>>
>>>>>        u->latency - real sink latency - real source latency
>>>>>
>>>>> and if that is less than the 500 ms that was just pushed to the queue,
>>>>> the extra data will be dropped. I don't see how we can end up with too
>>>>> long memblockq if adjust_memblockq() gets accurate sink and source
>>>>> latency information.
>>>> Sorry, but take a look at the code. We CALCULATE how long the memblockq
>>>> should be and pass that as a parameter to adjust_memblockq() AND we have
>>>> to subtract the first chunk length to get things right.
>>> I don't think I've said anything that conflicts with what you say here.
>>>
>>>> So if the first chunk is too big, the memblockq will be too long, as
>>>> simple as that.
>>> I don't understand how you jump to this conclusion.
>>>
>>> I'll try to explain how I understand the system:
>>>
>>> Invariant 1: memblockq length should always be this when the pipeline
>>> is fully up and running:
>>>
>>>       u->latency - real sink latency - real source latency
>>>
>>> When doing the final adjustment, the offset that is passed to
>>> adjust_memblockq() should be
>>>
>>>       real sink latency + real source latency
>>>
>>> Invariant 2: the real source latency at the time of processing the POST
>>> message can always be calculated like this:
>>>
>>>       latency report - size of the received chunk + time between sending and receiving the POST message
>>>
>>> You are saying that the size of the received chunk must be adjusted so
>>> that it won't be bigger than effective_source_latency. Do you think
>>> invariant 2 is wrong (that is, the formula doesn't always give an
>>> accurate measure of the real source latency), or that the invariant 1
>>> is wrong (that is, sometimes it makes sense to have a shorter memblockq
>>> than given by the formula)?
>> Invariant 2 is wrong when the source overruns. The received chunk can be
>> much
>> bigger than the latency report and your formula above will produce
>> negative values.
>> It is even possible that the source is sending more than one "much too
>> big" chunk,
>> so the situation that you end up with far too much data in the queue cannot
>> completely be avoided. I did not put the code there for some theoretical
>> reason,
>> you can easily reproduce the problem by switching rapidly between two
>> sources
>> at low latency.
> Ah, now I finally see the problem. To make the code more clear about
> this, I would suggest adding a separate variable for the source
> latency, and comment the calculation like this, for example:
>
> /* data contains the reported source latency at the time the
>   * message was sent. *
> source_latency = PA_PTR_TO_UINT(data);
>
> /* The source latency report includes the audio in the chunk,
>   * but since we already pushed the chunk to the memblockq, we need
>   * to subtract the chunk size from the source latency so that it
>   * won't be counted towards both the memblockq latency and the
>   * source latency.
>   *
>   * Sometimes the alsa source reports too small latencies for some
>   * reason. We can't reliably detect when that happens, but one
>   * obvious indicator is when the reported latency is smaller than
>   * the size of the chunk that the source generated. In those case
>   * we estimate the source latency to be zero, which may not be
>   * entirely accurate, but it's in any case more accurate than
>   * the reported latency. */
> if (chunk->length <= source_latency)
>      source_latency -= chunk->length;
> else
>      source_latency = 0;
>
> /* offset is a timestamp of the moment when the POST message was
>   * sent. It might have taken some time before we got to processing
>   * the message, and during that time the source latency has grown. */
> source_latency += pa_rtclock_now() - offset;
>
> I thought that the alsa source latency reports would always be directly
> based on snd_pcm_delay(), but now that I checked, we seem to be using a
> smoother instead. Maybe there's some smoother adjustment missing when
> the source's configured latency changes. Or maybe the smoother just
> behaves badly.
>
Setting the source latency to 0 is not what I am doing. I am assuming that
the source has the effective source latency and that is what seems
to be correct. Another reason why we cannot get rid of the variable.
I think when the source overruns, it will have the "effective source 
latency"
and just pushes the excess data. You can also see a overrun log message
whenever it happens.