[pulseaudio-discuss] [RFC PATCH] First draft implementation of better drain reporting

Mon Mar 4 03:33:08 PST 2013

On 03/02/2013 10:41 PM, Tanu Kaskinen wrote:
> On Fri, 2013-03-01 at 16:05 +0100, David Henningsson wrote:
>> First of all, this is an unfinished patch. At this point, it doesn't add any
>> new "device underrun" events, only responds to drains more timely. It's quite
>> tricky to get right (or I'm just dumb :-D ).
>
> It's also quite tricky to review (or I'm just dumb) :)

Heh, thanks for trying :-)

>> In short, the alsa-sink object first starts to keep track of how much has been
>> played back (since last fill up). The sink is informed about this value, which it
>> uses for two purposes:
>>   1) If a stream is currently underrunning, it returns how much underrun the stream
>>      actually is, so that alsa-sink can wake up at exactly this point in time.
>>   2) If all of a stream has actually been played back, it calls into the stream's
>>      new verify_underrun callback, and the stream's render data is emptied.
>>
>> Things remaining are:
>>   - First, I'm not sure the calculations are entirely correct. I think I shouldn't need
>>     to add process_usec to the wakeup, but if I don't I wake up a few samples before the
>>     stream has finished playback.
>>   - This is implemented in mmap_write but should also be in unix_write
>>   - A new device underrun callback for clients (requires protocol extensions)
>>   - protocol-native's verify_underrun_cb contains copy-pasted code from pop_cb
>>   - remove debug prints and maybe some other cleanup.
>>
>> Signed-off-by: David Henningsson <david.henningsson at canonical.com>
>> ---
>>   src/modules/alsa/alsa-sink.c    |   41 +++++++++++++++++++++++++++++++--------
>>   src/pulsecore/protocol-native.c |   28 ++++++++++++++++++++++++++
>>   src/pulsecore/sink-input.c      |   33 +++++++++++++++++++++++++++++++
>>   src/pulsecore/sink-input.h      |    6 ++++++
>>   src/pulsecore/sink.c            |   25 ++++++++++++++++++++++++
>>   src/pulsecore/sink.h            |    2 ++
>>   6 files changed, 127 insertions(+), 8 deletions(-)
>>
>> diff --git a/src/modules/alsa/alsa-sink.c b/src/modules/alsa/alsa-sink.c
>> index 69d006e..59be22f 100644
>> --- a/src/modules/alsa/alsa-sink.c
>> +++ b/src/modules/alsa/alsa-sink.c
>> @@ -146,6 +146,8 @@ struct userdata {
>>       uint64_t since_start;
>>       pa_usec_t smoother_interval;
>>       pa_usec_t last_smoother_update;
>> +    int64_t since_fill;
>> +    int64_t expected_avail;
>>
>>       pa_idxset *formats;
>>
>> @@ -508,7 +510,7 @@ static size_t check_left_to_play(struct userdata *u, size_t n_bytes, pa_bool_t o
>>   static int mmap_write(struct userdata *u, pa_usec_t *sleep_usec, pa_bool_t polled, pa_bool_t on_timeout) {
>>       pa_bool_t work_done = FALSE;
>>       pa_usec_t max_sleep_usec = 0, process_usec = 0;
>> -    size_t left_to_play;
>> +    size_t left_to_play, input_underrun;
>>       unsigned j = 0;
>>
>>       pa_assert(u);
>> @@ -535,11 +537,14 @@ static int mmap_write(struct userdata *u, pa_usec_t *sleep_usec, pa_bool_t polle
>>           }
>>
>>           n_bytes = (size_t) n * u->frame_size;
>> +        u->since_fill += n_bytes - u->expected_avail;
>
> This appears to be the only place where expected_avail is used, and in
> this context a lot of time has likely been passed since expected_avail
> was last updated, so the actual expected empty space in the hw buffer is
> usually much more than expected_avail. Therefore, the variable name is
> misleading. I don't have any better ideas, though
> (expected_avail_without_taking_the_passing_of_time_into_account is a bit
> too verbose), so never mind...

I'll change it to last_avail, which is probably slightly better.

>
>>
>>   #ifdef DEBUG_TIMING
>> -        pa_log_debug("avail: %lu", (unsigned long) n_bytes);
>> +        pa_log_debug("avail: %lu, since fill: %ld, advance: %lu", (unsigned long) n_bytes,
>> +                     (long) u->since_fill, (unsigned long) n_bytes - u->expected_avail);
>
> n_bytes - u->expected_avail may result in a negative value (at least in
> theory), but the format specifier is %lu. You probably meant to apply
> the cast to the result of the substraction, not only to n_bytes. But
> even then, a bogus value gets printed in case of a negative value. Maybe
> that's OK, though, since a negative value can occur only in case of a
> bug somewhere (in alsa or pulseaudio).

I'll change it to %ld.

>
>>   #endif
>>
>> +        u->expected_avail = n_bytes;
>>           left_to_play = check_left_to_play(u, n_bytes, on_timeout);
>>           on_timeout = FALSE;
>>
>> @@ -600,6 +605,7 @@ static int mmap_write(struct userdata *u, pa_usec_t *sleep_usec, pa_bool_t polle
>>               const snd_pcm_channel_area_t *areas;
>>               snd_pcm_uframes_t offset, frames;
>>               snd_pcm_sframes_t sframes;
>> +            size_t written;
>>
>>               frames = (snd_pcm_uframes_t) (n_bytes / u->frame_size);
>>   /*             pa_log_debug("%lu frames to write", (unsigned long) frames); */
>> @@ -635,12 +641,14 @@ static int mmap_write(struct userdata *u, pa_usec_t *sleep_usec, pa_bool_t polle
>>
>>               p = (uint8_t*) areas[0].addr + (offset * u->frame_size);
>>
>> -            chunk.memblock = pa_memblock_new_fixed(u->core->mempool, p, frames * u->frame_size, TRUE);
>> +            written = frames * u->frame_size;
>> +            chunk.memblock = pa_memblock_new_fixed(u->core->mempool, p, written, TRUE);
>>               chunk.length = pa_memblock_get_length(chunk.memblock);
>>               chunk.index = 0;
>>
>>               pa_sink_render_into_full(u->sink, &chunk);
>>               pa_memblock_unref_fixed(chunk.memblock);
>> +            u->since_fill = 0;
>
> Shouldn't this be done only after a successful snd_pcm_mmap_commit()
> call?

I'm not sure.

pa_sink_render_into_full is what's causing new data to be mixed, so one 
could see it as since filling up render_memblockq, rather than filling 
up the DMA area.

Either case, if mmap_commit fails, we probably have worse things to 
worry about than a wrong since_fill...

>>
>>               if (PA_UNLIKELY((sframes = snd_pcm_mmap_commit(u->pcm_handle, offset, frames)) < 0)) {
>>
>> @@ -655,21 +663,29 @@ static int mmap_write(struct userdata *u, pa_usec_t *sleep_usec, pa_bool_t polle
>>
>>               work_done = TRUE;
>>
>> -            u->write_count += frames * u->frame_size;
>> -            u->since_start += frames * u->frame_size;
>> +            u->write_count += written;
>> +            u->since_start += written;
>> +            u->expected_avail -= written;
>>
>>   #ifdef DEBUG_TIMING
>> -            pa_log_debug("Wrote %lu bytes (of possible %lu bytes)", (unsigned long) (frames * u->frame_size), (unsigned long) n_bytes);
>> +            pa_log_debug("Wrote %lu bytes (of possible %lu bytes)", (unsigned long) written, (unsigned long) n_bytes);
>>   #endif
>>
>> -            if ((size_t) frames * u->frame_size >= n_bytes)
>> +            if (written >= n_bytes)
>>                   break;
>>
>> -            n_bytes -= (size_t) frames * u->frame_size;
>> +            n_bytes -= written;
>>           }
>>       }
>>
>> +    input_underrun = pa_sink_find_underrun(u->sink, u->since_fill, left_to_play);
>
> I believe left_to_play is not the correct variable to use here. It is
> based on snd_pcm_avail(), but we're interested in the total latency, so
> snd_pcm_delay() should be used.
>
> Hmm, on the other hand, the new device underrun notifications should be
> sent as soon as the input underrun "leaves" the alsa ring buffer. So,
> the drain and underrun notifications should happen at different times.

I see your point, but having them both notify at the different times is 
an additional complexity that maybe we shouldn't deal with right now.

To ask a counter-question; is there a use case where it hurts that we 
haven't been including snd_pcm_delay when reporting back our drain? I 
mean, we won't close the sink for an additional five seconds, so there 
shouldn't be any risk that the end of the sound gets chopped.
And for the people who want to play back several wave files in a row, 
the proposed handling should be slightly better, right?

(Btw, the current drain handling doesn't take _delay into account either.)

>> +
>>       if (u->use_tsched) {
>> +        pa_usec_t underrun_sleep = 0;
>> +        if (input_underrun > 0)
>> +             /* Wakeup to notify that stream is drained */
>> +            underrun_sleep = pa_bytes_to_usec(left_to_play - input_underrun, &u->sink->sample_spec);
>> +
>>           *sleep_usec = pa_bytes_to_usec(left_to_play, &u->sink->sample_spec);
>>           process_usec = pa_bytes_to_usec(u->tsched_watermark, &u->sink->sample_spec);
>>
>> @@ -677,6 +693,11 @@ static int mmap_write(struct userdata *u, pa_usec_t *sleep_usec, pa_bool_t polle
>>               *sleep_usec -= process_usec;
>>           else
>>               *sleep_usec = 0;
>> +        if (underrun_sleep > 0) {
>> +            underrun_sleep += process_usec; /* Add some margin. I haven't figured out why we need this :-( */
>
> I think the margin should be a separate variable/constant. I don't think
> there's any connection between process_usec and the margin that we need
> here.

There is also the rewind_safeguard stuff. I haven't figured if or when 
that parameter should be involved...

>
>> +            *sleep_usec = PA_MIN(*sleep_usec, underrun_sleep);
>> +        }
>> +
>>       } else
>>           *sleep_usec = 0;
>>
>> @@ -1099,6 +1120,8 @@ static int unsuspend(struct userdata *u) {
>>       pa_smoother_reset(u->smoother, pa_rtclock_now(), TRUE);
>>       u->smoother_interval = SMOOTHER_MIN_INTERVAL;
>>       u->last_smoother_update = 0;
>> +    u->since_fill = 0;
>> +    u->expected_avail = u->hwbuf_size;
>
> This should probably be done also when the sink is initialized.

Hmm, I thought unsuspend was called during initialization, but maybe it 
isn't. Fixing.

>
>>
>>       u->first = TRUE;
>>       u->since_start = 0;
>> @@ -1663,6 +1686,8 @@ static int process_rewind(struct userdata *u) {
>>               pa_log_info("Tried rewind, but was apparently not possible.");
>>           else {
>>               u->write_count -= rewind_nbytes;
>> +            u->since_fill -= rewind_nbytes;
>> +            u->expected_avail += rewind_nbytes;
>>               pa_log_debug("Rewound %lu bytes.", (unsigned long) rewind_nbytes);
>>               pa_sink_process_rewind(u->sink, rewind_nbytes);
>>
>> diff --git a/src/pulsecore/protocol-native.c b/src/pulsecore/protocol-native.c
>> index a4ee920..0a39f2e 100644
>> --- a/src/pulsecore/protocol-native.c
>> +++ b/src/pulsecore/protocol-native.c
>> @@ -231,6 +231,7 @@ enum {
>>       CONNECTION_MESSAGE_REVOKE
>>   };
>>
>> +static pa_bool_t sink_input_verify_underrun_cb(pa_sink_input *i);
>
> s/pa_bool_t/bool/, s/TRUE/true/ and s/FALSE/false/.

A question; I'm not sure when we decided to ditch pa_bool_t, but anyway. 
It feels a bit weird to mix both in the same file. Wouldn't it make more 
sense to convert an entire file at once, instead of mixing?

>>   static int sink_input_pop_cb(pa_sink_input *i, size_t length, pa_memchunk *chunk);
>>   static void sink_input_kill_cb(pa_sink_input *i);
>>   static void sink_input_suspend_cb(pa_sink_input *i, pa_bool_t suspend);
>> @@ -1171,6 +1172,7 @@ static playback_stream* playback_stream_new(
>>
>>       s->sink_input->parent.process_msg = sink_input_process_msg;
>>       s->sink_input->pop = sink_input_pop_cb;
>> +    s->sink_input->verify_underrun = sink_input_verify_underrun_cb;
>>       s->sink_input->process_rewind = sink_input_process_rewind_cb;
>>       s->sink_input->update_max_rewind = sink_input_update_max_rewind_cb;
>>       s->sink_input->update_max_request = sink_input_update_max_request_cb;
>> @@ -1574,6 +1576,32 @@ static int sink_input_process_msg(pa_msgobject *o, int code, void *userdata, int
>>   }
>>
>>   /* Called from thread context */
>> +static pa_bool_t sink_input_verify_underrun_cb(pa_sink_input *i) {
>> +    playback_stream *s;
>> +
>> +    pa_sink_input_assert_ref(i);
>> +    s = PLAYBACK_STREAM(i->userdata);
>> +    playback_stream_assert_ref(s);
>> +
>> +    if (pa_memblockq_is_readable(s->memblockq))
>> +        return FALSE;
>> +
>> +    pa_log_debug("%s of '%s'", s->drain_request ? "Finished playback" : "Real underrun", pa_strnull(pa_proplist_gets(i->proplist, PA_PROP_MEDIA_NAME)));
>> +
>> +   if (s->drain_request) {
>> +        s->drain_request = FALSE;
>> +        pa_log("IMPLEMENTOR: Sending PLAYBACK_STREAM_MESSAGE_DRAIN_ACK");
>> +        pa_asyncmsgq_post(pa_thread_mq_get()->outq, PA_MSGOBJECT(s), PLAYBACK_STREAM_MESSAGE_DRAIN_ACK, PA_UINT_TO_PTR(s->drain_tag), 0, NULL, NULL);
>> +    } else if (!s->is_underrun)
>> +        pa_asyncmsgq_post(pa_thread_mq_get()->outq, PA_MSGOBJECT(s), PLAYBACK_STREAM_MESSAGE_UNDERFLOW, NULL, pa_memblockq_get_read_index(s->memblockq), NULL, NULL);
>> +
>> +    s->is_underrun = TRUE;
>> +/*    playback_stream_request_bytes(s); */
>> +    return TRUE;
>> +}
>> +
>> +
>> +/* Called from thread context */
>>   static int sink_input_pop_cb(pa_sink_input *i, size_t nbytes, pa_memchunk *chunk) {
>>       playback_stream *s;
>>
>> diff --git a/src/pulsecore/sink-input.c b/src/pulsecore/sink-input.c
>> index 519f47e..7968abd 100644
>> --- a/src/pulsecore/sink-input.c
>> +++ b/src/pulsecore/sink-input.c
>> @@ -826,6 +826,18 @@ pa_usec_t pa_sink_input_get_latency(pa_sink_input *i, pa_usec_t *sink_latency) {
>>       return r[0];
>>   }
>>
>> +/* Returns length of underrun, in sink bytes. Called from thread context */
>> +uint64_t pa_sink_input_get_underrun_for_sink(pa_sink_input *i)
>> +{
>> +    uint64_t u = i->thread_info.underrun_for;
>> +    if (u == (uint64_t) -1)
>> +        return 0;
>> +    if (!i->thread_info.resampler)
>> +        return u;
>> +    return pa_resampler_result(i->thread_info.resampler, u);
>> +}
>> +
>> +
>>   /* Called from thread context */
>>   void pa_sink_input_peek(pa_sink_input *i, size_t slength /* in sink frames */, pa_memchunk *chunk, pa_cvolume *volume) {
>>       pa_bool_t do_volume_adj_here, need_volume_factor_sink;
>> @@ -1020,6 +1032,27 @@ void pa_sink_input_drop(pa_sink_input *i, size_t nbytes /* in sink sample spec *
>>   }
>>
>>   /* Called from thread context */
>> +pa_bool_t pa_sink_input_verify_real_underrun(pa_sink_input *i, size_t nbytes /* in the sink's sample spec */) {
>
> I'd rename nbytes to underrun_for. A comment can be added explaining why
> it's not necessarily equal to i->thread_info.underrun_for.

nbytes is used to specify sink bytes in the surroundings, so I'm 
claiming the consistency argument for this one.

>
>> +    pa_sink_input_assert_ref(i);
>> +    pa_sink_input_assert_io_context(i);
>> +
>> +    if (nbytes < pa_memblockq_get_maxrewind(i->thread_info.render_memblockq))
>> +        return FALSE;
>> +    if (pa_memblockq_is_readable(i->thread_info.render_memblockq))
>> +        return FALSE;
>> +
>> +    if (!i->verify_underrun)
>> +        return FALSE;
>> +    if (i->verify_underrun(i)) {
>> +        /* All valid data has been played back, so we can empty this queue. */
>> +        pa_memblockq_silence(i->thread_info.render_memblockq);
>
> Why is pa_memblockq_silence() needed?

It is a precondition for pa_sink_input_safe_to_remove. I'm not sure it's 
needed now, as we already have sent the drain ack at this point. See it 
as an optimisation.

>> +        return TRUE;
>> +    }
>> +    return FALSE;
>> +}
>> +
>> +
>> +/* Called from thread context */
>>   void pa_sink_input_process_rewind(pa_sink_input *i, size_t nbytes /* in sink sample spec */) {
>>       size_t lbq;
>>       pa_bool_t called = FALSE;
>> diff --git a/src/pulsecore/sink-input.h b/src/pulsecore/sink-input.h
>> index 8bd5912..097a9fb 100644
>> --- a/src/pulsecore/sink-input.h
>> +++ b/src/pulsecore/sink-input.h
>> @@ -135,6 +135,9 @@ struct pa_sink_input {
>>        * the full block. */
>>       int (*pop) (pa_sink_input *i, size_t request_nbytes, pa_memchunk *chunk); /* may NOT be NULL */
>>
>> +    /* TODO: Write something here */
>> +    pa_bool_t (*verify_underrun) (pa_sink_input *i);
>> +
>>       /* Rewind the queue by the specified number of bytes. Called just
>>        * before peek() if it is called at all. Only called if the sink
>>        * input driver ever plans to call
>> @@ -407,6 +410,9 @@ int pa_sink_input_process_msg(pa_msgobject *o, int code, void *userdata, int64_t
>>   pa_usec_t pa_sink_input_set_requested_latency_within_thread(pa_sink_input *i, pa_usec_t usec);
>>
>>   pa_bool_t pa_sink_input_safe_to_remove(pa_sink_input *i);
>> +uint64_t pa_sink_input_get_underrun_for_sink(pa_sink_input *i);
>> +pa_bool_t pa_sink_input_verify_real_underrun(pa_sink_input *i, size_t nbytes /* in the sink's sample spec */);
>> +
>>
>>   pa_memchunk* pa_sink_input_get_silence(pa_sink_input *i, pa_memchunk *ret);
>>
>> diff --git a/src/pulsecore/sink.c b/src/pulsecore/sink.c
>> index 6ebe956..977b555 100644
>> --- a/src/pulsecore/sink.c
>> +++ b/src/pulsecore/sink.c
>> @@ -915,6 +915,31 @@ void pa_sink_move_all_fail(pa_queue *q) {
>>       pa_queue_free(q, NULL);
>>   }
>>
>> + /* Called from IO thread context */
>> +size_t pa_sink_find_underrun(pa_sink *s, size_t since_fill, size_t left_to_play) {
>> +    pa_sink_input *i;
>> +    void *state = NULL;
>> +    size_t result = 0;
>> +
>> +    pa_sink_assert_ref(s);
>> +    pa_sink_assert_io_context(s);
>> +/*    pa_log("PLAYBACK: %d bytes, left to play %d, total %d", (int) since_fill, (int) left_to_play, (int) (since_fill + left_to_play)); */
>> +    PA_HASHMAP_FOREACH(i, s->thread_info.inputs, state) {
>> +        size_t uf = pa_sink_input_get_underrun_for_sink(i);
>
> As a minor optimization and also for better readability, I think it
> would be better to maintain a separate for_underrun_sink variable in
> i->thread_info instead of having pa_sink_input_get_underrun_for_sink().

If so, I think there should still be some function, perhaps an inline 
function or so, because I don't like that a pa_sink_* function would 
directly peek i's variables.

>> +        if (uf == 0)
>> +            continue;
>> +        uf += since_fill;
>> +        if (pa_sink_input_verify_real_underrun(i, uf))
>
> This notifies an input about a real underrun. I think that's a bit
> surprising side effect for pa_sink_find_underrun(), so I think it would
> be clearer to have a separate function for doing those notifications,
> called before pa_sink_find_underrun().

I did this as a minor optimisation, in order not to traverse the hashmap 
one extra time.

Perhaps pa_sink_process_input_underruns() for a function that does both?

>> +            continue;
>> +        if (uf < left_to_play && uf > result)
>> +            result = uf;
>> +    }
>> +
>> +    if (result > 0)
>> +        pa_log_debug("Found underrun %d bytes ago", (int) result);
>> +    return result;
>> +}
>> +
>>   /* Called from IO thread context */
>>   void pa_sink_process_rewind(pa_sink *s, size_t nbytes) {
>>       pa_sink_input *i;
>> diff --git a/src/pulsecore/sink.h b/src/pulsecore/sink.h
>> index fcda5ef..6510405 100644
>> --- a/src/pulsecore/sink.h
>> +++ b/src/pulsecore/sink.h
>> @@ -492,6 +492,8 @@ void pa_sink_update_volume_and_mute(pa_sink *s);
>>
>>   pa_bool_t pa_sink_volume_change_apply(pa_sink *s, pa_usec_t *usec_to_next);
>>
>> +size_t pa_sink_find_underrun(pa_sink *s, size_t since_fill, size_t left_to_play);
>
> pa_sink_find_next_input_underrun() might be a clearer name?

> I'd also
> prefer the return value to be bytes to play before the underrun hits the
> speakers, instead of returning how long it has been since the
> render_memblockq became empty.

Return (left_to_play - result) instead? Sure, I don't mind either way.

-- 
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic