[pulseaudio-discuss] [PATCH v6 09/25] loopback: Refactor latency initialization

Wed Jul 13 22:02:09 UTC 2016

"Refactoring" usually means changing the code without changing the
behaviour. A better commit title for this patch would be "redo latency
initialization logic".

On Sun, 2016-06-05 at 21:05 +0200, Georg Chini wrote:
> Initialize the latency to sane values at startup or after source or sink switch.
> This ensures that there are no underruns during the initial phase and also that
> the overall latency starts near to the configured latency.

It would be nice to explain in more detail what problem this patch
solves - what was wrong with the previous code? Apparently there were
underruns during the initial phase and the overall latency didn't start
near to the configured latency, but what caused those issues?

> Additionally audio will be dropped until pop() is called for the first time to
> prevent excessive delays.
> Sink or source latencies smaller than 2.5 ms proved to produce a lot of errors,
> so the minimum source or sink latency is limited to 2.5 ms.
> Current boundary values are saved for the calculation of the minimum possible
> latency in one of the following patches.
> 
> ---
>  src/modules/module-loopback.c | 207 +++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 196 insertions(+), 11 deletions(-)
> 
> diff --git a/src/modules/module-loopback.c b/src/modules/module-loopback.c
> index 246d622..aafe41b 100644
> --- a/src/modules/module-loopback.c
> +++ b/src/modules/module-loopback.c
> @@ -61,6 +61,8 @@ PA_MODULE_USAGE(
>  
>  #define MEMBLOCKQ_MAXLENGTH (1024*1024*32)
>  
> +#define MIN_DEVICE_LATENCY (2.5*PA_USEC_PER_MSEC)
> +
>  #define DEFAULT_ADJUST_TIME_USEC (10*PA_USEC_PER_SEC)
>  
>  struct userdata {
> @@ -76,15 +78,30 @@ struct userdata {
>      pa_rtpoll_item *rtpoll_item_read, *rtpoll_item_write;
>  
>      pa_time_event *time_event;
> -    pa_usec_t adjust_time;
>  
>      int64_t recv_counter;
>      int64_t send_counter;
>  
>      size_t skip;
> +
> +    /* Values from command line configuration */
>      pa_usec_t latency;
> +    pa_usec_t adjust_time;
>  
> +    /* Latency boundaries and current values */
> +    pa_usec_t min_source_latency;
> +    pa_usec_t max_source_latency;
> +    int64_t source_offset;
> +    pa_usec_t min_sink_latency;
> +    pa_usec_t max_sink_latency;
> +    int64_t sink_offset;

This patch doesn't use source_offset or sink_offset for anything, so it
doesn't look like a good idea to add them in this patch.

"source_latency_offset" and "sink_latency_offset" would be clearer
names.

Also, the user may reconfigure the latency offsets at any time, so if
you're going to use the offsets for something, you'll need to monitor
them for changes.

> +    pa_usec_t configured_sink_latency;
> +    pa_usec_t configured_source_latency;
> +
> +    /* Various booleans */
>      bool in_pop;
> +    bool pop_called;
> +    bool fixed_alsa_source;

fixed_alsa_source is not used anywhere, so it's better to not add it in
this patch.

>  
>      struct {
>          int64_t send_counter;
> @@ -279,6 +296,46 @@ static void update_adjust_timer(struct userdata *u) {
>          enable_adjust_timer(u, true);
>  }
>  
> +/* Called from all contexts

Well, that's not a safe thing to do.

> + * Sets the memblockq to the requested latency corrected by offset */
> +static void memblockq_adjust(struct userdata *u, int32_t offset, bool allow_push) {
> +    size_t memblock_bytes, requested_buffer_bytes;
> +    pa_usec_t requested_buffer_latency;
> +    size_t buffer_offset;

There's already a variable named "offset". I think naming this
"bytes_to_drop" would make things a bit more readable.

> +    pa_memchunk silence;
> +
> +    requested_buffer_latency = u->latency;
> +    if (offset > 0)
> +        requested_buffer_latency = PA_CLIP_SUB(requested_buffer_latency, (pa_usec_t)offset);
> +    else
> +        requested_buffer_latency = requested_buffer_latency - offset;
> +
> +    requested_buffer_bytes = pa_usec_to_bytes(requested_buffer_latency, &u->sink_input->sample_spec);
> +    memblock_bytes = pa_memblockq_get_length(u->memblockq);

You probably meant "memblockq_bytes" rather than "memblock_bytes".
Anyway, I'd prefer "memblockq_length", since this variable seems to be
just a shorthand for pa_memblockq_get_length().

> +
> +    if ((int32_t)(memblock_bytes - requested_buffer_bytes) > 0) {

This cast seems dubious. Both variables are unsigned, so the
substraction result will be unsigned too. Perhaps this works fine in
practice (at least if size_t is a 32-bit type), since the result will
be larger than INT32_MAX, and will be reinterpreted as the correct
negative value.

Casting memblock_bytes to ssize_t before the substraction would look
less wrong, at least to me.

> +        /* Drop audio from queue */
> +        buffer_offset = memblock_bytes - requested_buffer_bytes;
> +        pa_log_info("Dropping %zd bytes from queue", buffer_offset);
> +        pa_memblockq_drop(u->memblockq, buffer_offset);
> +
> +    } else if ((int32_t)(memblock_bytes - requested_buffer_bytes) < 0 && allow_push) {

Same note about the casting as above.

> +        /* Add silence to queue, will never happen from IO-thread */
> +        requested_buffer_bytes = requested_buffer_bytes - memblock_bytes;

Seems like a good place to use the "-=" operator. Hmm... even better
might be to use a new variable, because this seems to redefine the
meaning of requested_buffer_bytes (after this the number does not any
more reflect what was requested).

> +        pa_log_info("Adding %zd bytes of silence to queue", requested_buffer_bytes);
> +        pa_sink_input_get_silence(u->sink_input, &silence);
> +        while (requested_buffer_bytes >= silence.length) {
> +            pa_memblockq_push_align(u->memblockq, &silence);
> +            requested_buffer_bytes -= silence.length;
> +        }
> +        if (requested_buffer_bytes > 0) {
> +            silence.length = requested_buffer_bytes;
> +            pa_memblockq_push_align(u->memblockq, &silence);
> +        }

No need to use pa_memblockq_push_align(). The memchunk length is
already aligned to the sink input sample spec.

> +        pa_memblock_unref(silence.memblock);
> +    }
> +}
> +
>  /* Called from input thread context */
>  static void source_output_push_cb(pa_source_output *o, const pa_memchunk *chunk) {
>      struct userdata *u;
> @@ -342,6 +399,57 @@ static int source_output_process_msg_cb(pa_msgobject *obj, int code, void *data,
>      return pa_source_output_process_msg(obj, code, data, offset, chunk);
>  }
>  
> +/* Calculates minimum and maximum possible latency for source and sink */
> +static void update_latency_boundaries(struct userdata *u, pa_source *source, pa_sink *sink) {
> +
> +    if (source) {
> +        /* Source latencies */
> +        u->fixed_alsa_source = false;
> +        if (source->flags & PA_SOURCE_DYNAMIC_LATENCY)
> +            pa_source_get_latency_range(source, &u->min_source_latency, &u->max_source_latency);
> +        else {
> +            u->min_source_latency = pa_source_get_fixed_latency(source);
> +            u->max_source_latency = u->min_source_latency;
> +            if (source->driver) {
> +                if (pa_streq(source->driver, "module-alsa-card.c"))

An alsa source may also be created by module-alsa-source. It would be
better to check if the "device.api" property of the source is set to
"alsa".

> +                    u->fixed_alsa_source = true;
> +            }
> +        }
> +        /* Source offset */
> +        u->source_offset = source->latency_offset;
> +        /* Latencies below 2.5 ms cause problems, limit source latency if possible */
> +        if (u->max_source_latency >= MIN_DEVICE_LATENCY)
> +            u->min_source_latency = PA_MAX(u->min_source_latency, MIN_DEVICE_LATENCY);

If min_source_latency is 0.5 ms and max_source_latency is 2.4 ms, you
probably still want to increase min_source_latency to 2.4 ms? This code
doesn't do that.

> +    }
> +
> +    if (sink) {
> +        /* Sink latencies */
> +        if (sink->flags & PA_SINK_DYNAMIC_LATENCY)
> +            pa_sink_get_latency_range(sink, &u->min_sink_latency, &u->max_sink_latency);
> +        else {
> +            u->min_sink_latency = pa_sink_get_fixed_latency(sink);
> +            u->max_sink_latency = u->min_sink_latency;
> +        }
> +        /* Sink offset */
> +        u->sink_offset = sink->latency_offset;
> +        /* Latencies below 2.5 ms cause problems, limit sink latency if possible */
> +        if (u->max_sink_latency >= MIN_DEVICE_LATENCY)
> +            u->min_sink_latency = PA_MAX(u->min_sink_latency, MIN_DEVICE_LATENCY);

Same comment as above.

> +    }
> +}
> +
> +/* Set source output latency to one third of the overall latency if possible */

It would be good to add a note that one third is a pretty arbitrary
choice. The overall latency shouldn't be smaller than the source
latency plus the sink latency, but other than that, the buffer size
choices are rather arbitrary, as far as I can tell.

> +static void set_source_output_latency(struct userdata *u, pa_source *source) {
> +    pa_usec_t latency, requested_latency;
> +
> +    requested_latency = u->latency / 3;
> +
> +    latency = PA_CLAMP(requested_latency , u->min_source_latency, u->max_source_latency);

There's an extra space before ",".

> +    u->configured_source_latency = pa_source_output_set_requested_latency(u->source_output, latency);
> +    if (u->configured_source_latency != requested_latency)
> +        pa_log_warn("Cannot set requested source latency of %0.2f ms, adjusting to %0.2f ms", (double)requested_latency / PA_USEC_PER_MSEC, (double)u->configured_source_latency / PA_USEC_PER_MSEC);

If the actual configured latency of either sink or source is higher
than expected, then it may be that the configured source latency plus
the configured sink latency is higher than the configured overall
latency. In this case the target overall latency should be pushed up,
so that we don't try to target an impossible latency.

> +}
> +
>  /* Called from output thread context */
>  static void source_output_attach_cb(pa_source_output *o) {
>      struct userdata *u;
> @@ -419,6 +527,7 @@ static void source_output_moving_cb(pa_source_output *o, pa_source *dest) {
>      struct userdata *u;
>      char *input_description;
>      const char *n;
> +    pa_usec_t sink_latency;
>  
>      if (!dest)
>          return;
> @@ -435,6 +544,26 @@ static void source_output_moving_cb(pa_source_output *o, pa_source *dest) {
>      if ((n = pa_proplist_gets(dest->proplist, PA_PROP_DEVICE_ICON_NAME)))
>          pa_sink_input_set_property(u->sink_input, PA_PROP_DEVICE_ICON_NAME, n);
>  
> +    /* Set latency and calculate necessary buffer length
> +     * In some profile switching situations the sink will be invalid here. If so,
> +     * skip the buffer adjustment, it will be done when the sink becomes valid */
> +
> +    update_latency_boundaries(u, dest, NULL);
> +    set_source_output_latency(u, dest);
> +
> +    if (!u->sink_input->sink)
> +       memblockq_adjust(u, 0, true);

The indentation is a bit off here.

> +    else {
> +        pa_sink_input_get_latency(u->sink_input, &sink_latency);

Why not pa_sink_get_latency()?

> +        if (u->send_counter > u->recv_counter)
> +            sink_latency += pa_bytes_to_usec(u->send_counter - u->recv_counter, &u->sink_input->sample_spec);
> +        if (dest->flags & PA_SOURCE_DYNAMIC_LATENCY)
> +            sink_latency += pa_source_get_latency(dest);
> +        else
> +            sink_latency = PA_CLIP_SUB(sink_latency, pa_source_get_latency(dest));

sink_latency is not the sink latency after all these additions and
substractions... I think the variable should either be renamed to
something more accurate, or use multiple variables.

Why do you substract the source latency from sink_latency if the source
has fixed latency? I would expect you to add the source latency in
either case.

> +        memblockq_adjust(u, sink_latency, true);
> +    }
> +
>      if (pa_source_get_state(dest) == PA_SOURCE_SUSPENDED)
>          pa_sink_input_cork(u->sink_input, true);
>      else
> @@ -465,6 +594,11 @@ static int sink_input_pop_cb(pa_sink_input *i, size_t nbytes, pa_memchunk *chunk
>      pa_assert_se(u = i->userdata);
>      pa_assert(chunk);
>  
> +    u->pop_called = true;
> +
> +    /* It is not clear why the message queue needs to be drained in the pop callback,
> +     * but not doing so causes underruns at low latencies. If the message processed
> +     * here is the end of an underrun, the corresponding rewind will not be executed */
>      u->in_pop = true;
>      while (pa_asyncmsgq_process_one(u->asyncmsgq) > 0)
>          ;
> @@ -514,10 +648,16 @@ static int sink_input_process_msg_cb(pa_msgobject *obj, int code, void *data, in
>  
>              pa_sink_input_assert_io_context(u->sink_input);
>  
> -            if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state))
> -                pa_memblockq_push_align(u->memblockq, chunk);
> -            else
> -                pa_memblockq_flush_write(u->memblockq, true);
> +            pa_memblockq_push_align(u->memblockq, chunk);
> +            u->recv_counter += (int64_t) chunk->length;
> +
> +            /* If the sink is not yet playing, discard audio and adjust the buffer to contain
> +             * somewhat more than the configured latency. Use a quarter of the current sink
> +             * latency as safety margin */
> +            if (!PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state) || !u->pop_called) {
> +                memblockq_adjust(u, (int32_t)(-u->configured_sink_latency / 4), false);

-u->configured_sink_latency is a very large unsigned number, divide it
by four and it's still huge. It happens that dividing by four somehow
makes the result correct due to the 64-bit -> 32-bit conversion (I
don't exactly understand how that works, but I tried it with a simple
test program), but dividing by e.g. 5 would yield an unexpected result.
This code definitely needs fixing.

> +                return 0;
> +            }

The comment says "discard audio", but the posted chunk was pushed to
the memblockq. I guess you meant discarding old audio in case the
memblockq was already "full". In that case, try to make the comment
sound like you're not discarding the received chunk.

What's the "safety margin" good for? Why would you ever want to buffer
more than what the target latency is?

>  
>              /* Is this the end of an underrun? Then let's start things
>               * right-away */
> @@ -531,18 +671,20 @@ static int sink_input_process_msg_cb(pa_msgobject *obj, int code, void *data, in
>                                               false, true, false);
>              }
>  
> -            u->recv_counter += (int64_t) chunk->length;
> -
>              return 0;
>  
>          case SINK_INPUT_MESSAGE_REWIND:
>  
>              pa_sink_input_assert_io_context(u->sink_input);
>  
> -            if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state))
> +            if (PA_SINK_IS_OPENED(u->sink_input->sink->thread_info.state) && u->pop_called)
>                  pa_memblockq_seek(u->memblockq, -offset, PA_SEEK_RELATIVE, true);
>              else
> -                pa_memblockq_flush_write(u->memblockq, true);
> +                /* If the sink is not yet playing, adjust the buffer to contain
> +                 * somewhat more than the configured latency, a rewind does not
> +                 * make sense in that case. Use a quarter of the current sink
> +                 * latency as safety margin */
> +                memblockq_adjust(u, (int32_t)(-u->configured_sink_latency / 4), false);

Why wouldn't the rewind make sense? (Well, source rewinding in general
is something I'm not sure makes a whole lot sense, but let's ignore
that.) I think the rewinding here means that the source told that the
last bit it sent out should be discarded. Since you're queuing all sent
data in the POST message handler even when the sink isn't running, I
think pa_memblockq_seek() would make sense here to get rid of the
recently received audio.

Since pa_memblockq_seek() will make the memblockq shorter, I don't
think memblockq_adjust() makes sense. I don't know what your original
purpose of calling memblockq_adjust() here was.

As with the earlier code, negating an unsigned 64-bit integer, dividing
it by four and casting it to a signed 32-bit integer works as expected,
but just by chance.

>  
>              u->recv_counter -= offset;
>  
> @@ -567,6 +709,18 @@ static int sink_input_process_msg_cb(pa_msgobject *obj, int code, void *data, in
>      return pa_sink_input_process_msg(obj, code, data, offset, chunk);
>  }
>  
> +/* Set sink input latency to one third of the overall latency if possible */

Again, you could mention that the choice of one third is pretty
arbitrary.

> +static void set_sink_input_latency(struct userdata *u, pa_sink *sink) {
> +     pa_usec_t latency, requested_latency;

The indentation is a bit off here.

> +
> +    requested_latency = u->latency / 3;
> +
> +    latency = PA_CLAMP(requested_latency , u->min_sink_latency, u->max_sink_latency);

There's an extra space before the ",".

> +    u->configured_sink_latency = pa_sink_input_set_requested_latency(u->sink_input, latency);
> +    if (u->configured_sink_latency != requested_latency)
> +        pa_log_warn("Cannot set requested sink latency of %0.2f ms, adjusting to %0.2f ms", (double)requested_latency / PA_USEC_PER_MSEC, (double)u->configured_sink_latency / PA_USEC_PER_MSEC);

I'll copy my earlier comment from set_source_output_latency():

If the actual configured latency of either sink or source is higher
than expected, then it may be that the configured source latency plus
the configured sink latency is higher than the configured overall
latency. In this case the target overall latency should be pushed up,
so that we don't try to target an impossible latency.

> +}
> +
>  /* Called from output thread context */
>  static void sink_input_attach_cb(pa_sink_input *i) {
>      struct userdata *u;
> @@ -596,6 +750,8 @@ static void sink_input_detach_cb(pa_sink_input *i) {
>          pa_rtpoll_item_free(u->rtpoll_item_read);
>          u->rtpoll_item_read = NULL;
>      }
> +
> +    u->pop_called = false;
>  }
>  
>  /* Called from output thread context */
> @@ -649,6 +805,7 @@ static void sink_input_moving_cb(pa_sink_input *i, pa_sink *dest) {
>      struct userdata *u;
>      char *output_description;
>      const char *n;
> +    pa_usec_t source_latency;
>  
>      if (!dest)
>          return;
> @@ -665,6 +822,22 @@ static void sink_input_moving_cb(pa_sink_input *i, pa_sink *dest) {
>      if ((n = pa_proplist_gets(dest->proplist, PA_PROP_DEVICE_ICON_NAME)))
>          pa_source_output_set_property(u->source_output, PA_PROP_MEDIA_ICON_NAME, n);
>  
> +    /* Set latency and calculate necessary buffer length
> +     * In some profile switching situations the source will be invalid here. If so,
> +     * skip the buffer adjustment, it will be done when the source becomes valid */
> +
> +    update_latency_boundaries(u, NULL, dest);
> +    set_sink_input_latency(u, dest);
> +
> +    if (!u->source_output->source)
> +        memblockq_adjust(u, 0, true);
> +    else {
> +        pa_source_output_get_latency(u->source_output, &source_latency);

Why not pa_source_get_latency()?

> +        if (u->send_counter > u->recv_counter)
> +            source_latency += pa_bytes_to_usec(u->send_counter - u->recv_counter, &u->sink_input->sample_spec);
> +        memblockq_adjust(u, source_latency, true);

In source_output_moving_cb() you took both sink and source latencies
into account, shouldn't you do that here too?

> +    }
> +
>      if (pa_sink_get_state(dest) == PA_SINK_SUSPENDED)
>          pa_source_output_cork(u->source_output, true);
>      else
> @@ -798,6 +971,7 @@ int pa__init(pa_module *m) {
>      u->core = m->core;
>      u->module = m;
>      u->latency = (pa_usec_t) latency_msec * PA_USEC_PER_MSEC;
> +    u->pop_called = false;
>  
>      adjust_time_sec = DEFAULT_ADJUST_TIME_USEC / PA_USEC_PER_SEC;
>      if (pa_modargs_get_value_u32(ma, "adjust_time", &adjust_time_sec) < 0) {
> @@ -876,7 +1050,8 @@ int pa__init(pa_module *m) {
>      u->sink_input->suspend = sink_input_suspend_cb;
>      u->sink_input->userdata = u;
>  
> -    pa_sink_input_set_requested_latency(u->sink_input, u->latency/3);
> +    update_latency_boundaries(u, NULL, u->sink_input->sink);
> +    set_sink_input_latency(u, u->sink_input->sink);
>  
>      pa_source_output_new_data_init(&source_output_data);
>      source_output_data.driver = __FILE__;
> @@ -927,7 +1102,8 @@ int pa__init(pa_module *m) {
>      u->source_output->suspend = source_output_suspend_cb;
>      u->source_output->userdata = u;
>  
> -    pa_source_output_set_requested_latency(u->source_output, u->latency/3);
> +    update_latency_boundaries(u, u->source_output->source, u->sink_input->sink);
> +    set_source_output_latency(u, u->source_output->source);
>  
>      pa_sink_input_get_silence(u->sink_input, &silence);
>      u->memblockq = pa_memblockq_new(
> @@ -942,6 +1118,15 @@ int pa__init(pa_module *m) {
>              &silence);              /* silence frame */
>      pa_memblock_unref(silence.memblock);
>  
> +    /* Fill the memblockq with silence to avoid underruns at startup.
> +     * For dynamic latency devices it is enough to use the configured
> +     * loopback latency, else add some safety margin. A quarter of the
> +     * configured sink latency should be sufficient */
> +    if(u->sink_input->sink->flags & PA_SINK_DYNAMIC_LATENCY)
> +        memblockq_adjust(u, 0, true);
> +    else
> +        memblockq_adjust(u, (int32_t)(-u->configured_sink_latency / 4), true);

Here again I don't understand why any safety margin beyond the target
latency would be needed. Is it to handle the case where the source is
initially stopped and it takes a while before it pushes out the first
chunk of audio? I think it would be better to handle this similarly to
how you wait until pop() has been called for the first time. You don't
know how long startup delay the source is going to have.

Also, this is another instance of the accidentally-correct casting-
unsigned-to-signed mess that appeared a couple of times earlier.

-- 
Tanu