[pulseaudio-discuss] [PATCH 02/21 v2] loopback: Initialize latency at startup and during source/sink changes

Sat Feb 25 17:21:16 UTC 2017

On Sat, 2017-02-25 at 14:01 +0100, Georg Chini wrote:
> On 25.02.2017 13:51, Tanu Kaskinen wrote:
> > On Sat, 2017-02-25 at 12:13 +0100, Georg Chini wrote:
> > > On 24.02.2017 19:19, Tanu Kaskinen wrote:
> > > > On Thu, 2017-02-23 at 17:55 +0100, Georg Chini wrote:
> > > > > On 23.02.2017 12:17, Tanu Kaskinen wrote:
> > > > > > On Sun, 2017-02-19 at 17:15 +0100, Georg Chini wrote:
> > > > > > > +        u->output_thread_info.pop_called = true;
> > > > > > > +    }
> > > > > > > +    u->output_thread_info.first_pop_done = true;
> > > > > > >     
> > > > > > >         if (pa_memblockq_peek(u->memblockq, chunk) < 0) {
> > > > > > >             pa_log_info("Could not peek into queue");
> > > > > > > @@ -478,6 +644,12 @@ static int sink_input_pop_cb(pa_sink_input *i, size_t nbytes, pa_memchunk *chunk
> > > > > > >         chunk->length = PA_MIN(chunk->length, nbytes);
> > > > > > >         pa_memblockq_drop(u->memblockq, chunk->length);
> > > > > > >     
> > > > > > > +    /* If push has not been called yet, assume that the source will deliver one full latency
> > > > > > > +     * when it starts pushing. Adjust the memblockq accordingly and ensure that there is
> > > > > > > +     * enough data in the queue to avoid underruns. */
> > > > > > > +    if (!u->output_thread_info.push_called)
> > > > > > > +        memblockq_adjust(u, u->output_thread_info.effective_source_latency, true);
> > > > > > 
> > > > > > This seems unnecessary. Why can't we just let the pop calls drain the
> > > > > > memblockq, and add the necessary amount of silence to the memblockq
> > > > > > only when we receive the first chunk from the source?
> > > > > 
> > > > > The whole point is to keep on delivering silence even in the case that
> > > > > the source takes a long time to start. So we avoid draining the
> > > > > memblockq completely, which would lead to underruns.
> > > > 
> > > > Why do you want to avoid underruns during the startup? Just to avoid
> > > > unnecessary noise in the log? (That would be a good enough reason for
> > > > me.)
> > > > 
> > > > If the memblockq becomes empty, silence can still be easily generated.
> > > > The silence doesn't have to come from the memblockq.
> > > 
> > > The reason are the ugly underrun messages as already stated
> > > above. And why should I generate silence any other way? The
> > > memblockq is there and I just push some silence into it.
> > 
> > Because we might get rid of the effective_source_latency variable, and
> > the code would become a bit easier to understand when there's no need
> > to think about whether the memblockq has the right amount of silence.
> > 
> > > > > > > +                time_delta += pa_sink_get_latency_within_thread(u->sink_input->sink);
> > > > > > > +
> > > > > > > +                /* If the source has overrun, assume that the maximum it should have pushed is
> > > > > > > +                 * one full source latency. It may still be possible that the next push also
> > > > > > > +                 * contains too much data, then the resulting latency will be wrong. */
> > > > > > > +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
> > > > > > > +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
> > > > > > > +                else
> > > > > > > +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
> > > > > > 
> > > > > > It's unclear to me what's happening here. I guess you're substracting
> > > > > > chunk->length from time_delta, because when push_cb was called, the
> > > > > > chunk was still included in the source latency report (I'm a bit
> > > > > > worried that some sources might not do this), but the chunk has also
> > > > > > been pushed to the memblockq, so if we don't adjust time_delta, the
> > > > > > chunk will be counted twice, and the memblockq adjustment will be
> > > > > > wrong.
> > > > > 
> > > > > Yes, exactly.
> > > > > > But I don't really get why you are concerned with "overruns", and how
> > > > > > your code mitigates any problems. If you substract only a part of
> > > > > > chunk->length, it means that a part of the chunk will be counted twice,
> > > > > > resulting in wrong latency.
> > > > > 
> > > > > No, it means, that the memblockq will be adjusted to the correct length
> > > > > because it is adjusted after the data has been pushed.
> > > > 
> > > > The correct memblockq length is u->latency - sink latency - source
> > > > latency. To me it seems that your adjustment makes us overestimate the
> > > > source latency.
> > > > 
> > > > The latency report that the source gives is (1) the current chunk
> > > > length, plus (2) any buffered data that isn't included in the current
> > > > chunk. In adjust_memblockq() we're only interested in (2), which is why
> > > > we subtract the chunk length from the reported source latency. If the
> > > > chunk was huge for some reason, adjust_memblockq() will in any case
> > > > drop excess data.
> > > > 
> > > > > If the source
> > > > > has overrun and delivered much more data than expected, the excess
> > > > > data will be dropped.
> > > > 
> > > > If the source latency is overestimated, too much data will be dropped.
> > > 
> > > You are talking about two different things. The source latency is
> > > very different from the first chunk of data that is pushed.
> > 
> > Yes, they are very different things. But adjust_memblockq() needs to
> > know the real source latency, it doesn't care about the size of the
> > first chunk. It doesn't matter how big the first chunk is, because
> > adjust_memblockq() will in any case adjust the memblockq length so that
> > the end-to-end latency will match what is configured.
> > 
> > > When switching quickly back and forth between two sources, it
> > > happens, that a source configured to a few ms of latency
> > > suddenly pushes 500 ms of data on the first push. I think this
> > > is due to the fact that the source is re-configured to its maximum
> > > latency when it goes to idle and when switching back to low latency
> > > it somehow does not get rid of the excess data soon enough.
> > > Remember, the adjustment is only done on the first push, so it is
> > > a one-shot thing just to get the queue right. If I don't drop the
> > > data, I end up with those 500 ms too much in the memblockq.
> > 
> > The memblockq length should always be
> > 
> >      u->latency - real sink latency - real source latency
> > 
> > and if that is less than the 500 ms that was just pushed to the queue,
> > the extra data will be dropped. I don't see how we can end up with too
> > long memblockq if adjust_memblockq() gets accurate sink and source
> > latency information.
> 
> Sorry, but take a look at the code. We CALCULATE how long the memblockq
> should be and pass that as a parameter to adjust_memblockq() AND we have
> to subtract the first chunk length to get things right.

I don't think I've said anything that conflicts with what you say here.

> So if the first chunk is too big, the memblockq will be too long, as
> simple as that.

I don't understand how you jump to this conclusion.

I'll try to explain how I understand the system:

Invariant 1: memblockq length should always be this when the pipeline
is fully up and running:

    u->latency - real sink latency - real source latency

When doing the final adjustment, the offset that is passed to
adjust_memblockq() should be

    real sink latency + real source latency

Invariant 2: the real source latency at the time of processing the POST
message can always be calculated like this:

    latency report - size of the received chunk + time between sending and receiving the POST message

You are saying that the size of the received chunk must be adjusted so
that it won't be bigger than effective_source_latency. Do you think
invariant 2 is wrong (that is, the formula doesn't always give an
accurate measure of the real source latency), or that the invariant 1
is wrong (that is, sometimes it makes sense to have a shorter memblockq
than given by the formula)?

> > 
> > > > While reading the code again, I noticed some more issues:
> > > > 
> > > > >            case SINK_INPUT_MESSAGE_POST:
> > > > >    
> > > > > -            pa_sink_input_assert_io_context(u->sink_input);
> > > > > +            pa_memblockq_push_align(u->memblockq, chunk);
> > > > > +
> > > > > +            /* If push has not been called yet, latency adjustments in sink_input_pop_cb()
> > > > > +             * are enabled. Disable them on first push and correct the memblockq. Do the
> > > > > +             * same if the pop_cb() requested the adjustment */
> > > > > +            if (!u->output_thread_info.push_called || u->output_thread_info.pop_adjust) {
> > > > > +                pa_usec_t time_delta;
> > > > >    
> > > > > -            if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state))
> > > > > -                pa_memblockq_push_align(u->memblockq, chunk);
> > > > > -            else
> > > > > -                pa_memblockq_flush_write(u->memblockq, true);
> > > > > +                time_delta = PA_PTR_TO_UINT(data);
> > > > > +                time_delta += pa_rtclock_now() - offset;
> > > > > +                time_delta += pa_sink_get_latency_within_thread(u->sink_input->sink);
> > > > > +
> > > > > +                /* If the source has overrun, assume that the maximum it should have pushed is
> > > > > +                 * one full source latency. It may still be possible that the next push also
> > > > > +                 * contains too much data, then the resulting latency will be wrong. */
> > > > > +                if (pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec) > u->output_thread_info.effective_source_latency)
> > > > > +                    time_delta = PA_CLIP_SUB(time_delta, u->output_thread_info.effective_source_latency);
> > > > > +                else
> > > > > +                    time_delta = PA_CLIP_SUB(time_delta, pa_bytes_to_usec(chunk->length, &u->sink_input->sample_spec));
> > > > > +
> > > > > +                memblockq_adjust(u, time_delta, true);
> > > > > +
> > > > > +                u->output_thread_info.pop_adjust = false;
> > > > > +                u->output_thread_info.push_called = true;
> > > > > +            }
> > > > 
> > > > This adjustment needs to be postponed until pop has been called. If the
> > > > sink is not yet running, the sink latency is undefined.
> > > 
> > > No, it does not. "Undefined" means, that the pa_sink_get_latency_in_thread()
> > > call just returns 0. Where is the problem with that?
> > 
> > 0 doesn't correspond to the real world, that's the problem. It's
> > pointless to try to make calculations based on incorrect input.
> 
> You are yourself saying I should just use 0 at another place, which does not
> correspond to real world either.

There are two kinds of situations where the memblockq is adjusted:
situations where it's important that the memblockq is adjusted exactly
to the target latency, using accurate sink and source latency
information, and situations where the only goal is to avoid the
memblockq growing so large that the memory usage becomes a concern (in
this latter situation there's no requirement to adjust the memblockq to
any precise value).

I assumed that you believed that this piece code was supposed to be
executed only in those situations where high precision was required.
That seems to have been an incorrect assumption.

> > > And in fact, this adjustment is done twice if sink and source are
> > > both starting, once for the source and once for the sink. Then the
> > > second adjustment does the right thing regardless if sink or source
> > > started first.
> > 
> > The second adjustment does the right thing regardless of whether the
> > first adjustment was done or not, so why not skip the first adjustment
> > altogether?
> 
> Because it might be the case that we are only switching the source OR the
> sink and if we are changing both or starting up, we don't know which will
> come first.

So you're saying that skipping the first adjustment as I suggested
can't be done? I think this should do the trick:

if (pop_adjust) {
    /* Final adjustment. */
    offset = sink_latency + source_latency;
    memblockq_adjust(offset, true);
    pop_adjust = false;
} else if (!pop_called) {
    /* Waiting for the sink to start. */
    offset = 0;
    memblockq_adjust(offset, false);
}
push_called = true;

> > 
> > > > Also, if the memblockq is too short, memblockq_adjust() will add
> > > > silence, but it's written after the valid data, so there will be a gap
> > > > in playback. The silence should be added by using
> > > > pa_memblockq_rewind(), but it has to be ensured that the memblockq
> > > > doesn't still contain some old non-silence data that gets replayed. I'm
> > > > not sure how to best do that.
> > > 
> > > This is the only situation where I allow pushing silence after having
> > > valid data in the queue. Since this is a one (or two) time event, I think
> > > it is acceptable to have a small gap in the stream to fix up the latency.
> > > Because it is done in a switching situation, I believe nobody will ever
> > > notice the gap.
> > 
> > Ok, if it's a deliberate decision to have simpler code at the risk of
> > glitches, then a FIXME comment would be appropriate.
> > 
> > > > > +
> > > > > +            /* If pop has not been called yet, make sure the latency does not grow too much.
> > > > > +             * Don't push any silence here, because we already have new data in the queue */
> > > > > +            if (!u->output_thread_info.pop_called)
> > > > > +                 memblockq_adjust(u, pa_sink_get_latency_within_thread(->sink_input->sink), false);
> > > > 
> > > > As mentioned above, the sink latency is undefined if the sink is not
> > > > running yet. Luckily there's no need need to know the sink latency
> > > > here: before pop has been called, the only reason why the memblockq
> > > > needs to be adjusted in the POST handler is to avoid it becoming huge
> > > > and thus consuming a lot of memory. memblockq_adjust() can be called
> > > > here with simply zero as the offset parameter.
> > > > 
> > > 
> > > This will keep too much data in the queue if the sink is running.
> > > This would not be a big problem, but why not adjust as good as possible in this
> > > situation?
> > 
> > Because there's no benefit in doing that. Using 0 is simpler and
> > doesn't involve invoking kernel code to ask for information that might
> > be bogus anyway.
> > 
> 
> Sorry, slowly I am thinking that you don't want the patches at all and 
> are nitpicking to discourage me. If you don't want the patches, just say so.

I assure you, I want these patches, especially this one, because this
patch should fix two bugs that I know have caused people trouble (slow
adjustment to target latency and accumulating a lot of latency when the
sink is slow to start).

I fully understand that the process is frustrating when we have trouble
understanding each other. It's frustrating to me too. I wish I knew how
to make it easier for us to reach common understanding.

If you think that some criticism is excessive nitpicking, and you don't
want to do the suggested change, you can say "I prefer doing it this
way. If you want to change it, you can write a patch later yourself,
ok?" That has been known to work sometimes.

> Thinking twice, there is not even a way you could get to that call
> while the sink is suspended, so pa_sink_get_latency_in_thread() will
> always return some value.

Are you still referring to this code:

+            /* If pop has not been called yet, make sure the latency does not grow too much.
+             * Don't push any silence here, because we already have new data in the queue */
+            if (!u->output_thread_info.pop_called)
+                 memblockq_adjust(u, pa_sink_get_latency_within_thread(u->sink_input->sink), false);

The only condition is that pop hasn't been called, and from that you
certainly can't draw the conclusion that the sink is not suspended. I
am again very confused.

-- 
Tanu

https://www.patreon.com/tanuk