[pulseaudio-discuss] [PATCH 03/11] resampler: Change interface, resampler may return the number of leftover frames

Fri Nov 8 12:34:16 PST 2013

Peter Meerwald wrote:

> some resampler implementations (e.g. libsamplerate and ffmpeg) do not consume
> the entire input buffer; the impl_resample() function now has a return value
> returning the number of frames in the input buffer not processed
> these frames must be saved in appropriate buffer and presented together with
> new input data

(yes I know this is too late, as the patch is already pushed, but...)

First, I slightly object to the commit message. It may be
misinterpreted as "they don't even look at the entire input buffer".
While sometimes this is true (e.g. because the output buffer size was
a limiting factor), there is another important reason why some samples
need to be presented again. Namely, this happens if they affect both
the old and the new chunk of output data and the resampler
implementation relies on the user to housekeep the input buffers.

Second, I consider all resamplers that don't do this (i.e. that handle
the necessary buffering internally in an opaque manner) as
fundamentally broken if rewinds are allowed, and would argue for their
refactoring or deletion. And here is why (yes, I am a perfectionist in
the bad sense of this word).

When a rewind is requested on a sink input, currently pulseaudio
completely resets the resampler. So, even if the same data is written
again, the internal state is lost. This results in an audible click.

What's even worse, the exact correspondence between the input and
output sample _count_ is, strictly speaking, lost due to these resets.
Consider (just for illustration) a simple 1.5x downsampler that just
copies input samples to output, dropping every third sample. Its
internal state is just the input sample index modulo 3. By
systematically rewinding it often enough (while still guaranteeing
progress), we can cause it to drop less or more samples than it would
during its normal operation.

Now, what could pulseaudio do in order to avoid this? A dumb answer
is: it could save the internal state of the resampler to its own
buffer and restore it. How often should it do that? A fail-safe answer
is: "for every input sample", but that's expensive. We could to that
for every N samples, and correspondingly round the rewind amount up.
That's definitely possible, and that's the best what pulseaudio can do
if it treats a resampler as some opaque object. Except that the
resamplers don't provide any API for saving and restoring the state,
which is fixable by importing them into pulseaudio tree (as we did
with the ffmpeg resampler) and adding the corresponding functions.

The point is that we can do even better, because the internal state of
a resampler is not an opaque object.

Here is what the resampler state actually contains for real-world resamplers.

FFMpeg uses this:

typedef struct AVResampleContext{
    FELEM *filter_bank;
    int filter_length;
    int ideal_dst_incr;
    int dst_incr;
    int index;
    int frac;
    int src_incr;
    int compensation_distance;
    int phase_shift;
    int phase_mask;
    int linear;
}AVResampleContext;

Only the following fields are written to during the normal operation
(i.e. after initialization): frac, index, dst_incr,
compensation_distance. And compensation_distance is unused (always 0)
in pulseaudio, because av_resample_compensate() is never called. For
the same reason, dst_incr is always equal to in_rate * (1 <<
phase_shift). So, only index and frac remain. Look, no sample buffer!

Now let's consider how these fields are updated. It turns out that
their increments do not depend on the actual samples and thus are, in
theory, computable from the number of processed and/or produced
samples only. So here is one very important result: ffmpeg resampler
is essentially stateless, at the cost of asking for some input samples
twice. Its so called "internal state" can be regenerated from scratch
after any rewind, if one does the math, i.e. there is no need to save
and restore anything. So, ffmpeg resampler is a promising candidate
for clickless rewinds, as it is almost ready. Here is the list of what
remains to be done:

 * A closed-form expression for index and frac just after producing
Nth output sample.
 * A closed-form expression for the index of the first and the last
input sample affecting Nth output sample given the current filter
length.
 * A closed-form expression for the index of the first and the last
output sample affected by Mth input sample.
 * A new uint64_t field in the sink structure that counts the samples
produced by the sink.
 * More accurate calculation of the sink input rewind amount in
pa_sink_input_process_rewind, instead of the inaccurate generic
pa_resampler_request() function that tries to second-guess behind the
resampler implementation without knowing about the filter length and
with disregarding the fact that the "leftover" data wobbles in length
slightly during the normal operation.
 * More accurate conversion back into the sink domain, instead of the
inaccurate pa_resampler_result() function.
 * Regeneration of index and frac from the sample counter instead of
resetting them.

Other resamplers contain essectially the same fields in their state
structrue, plus possibly a buffer that contains either a copy of some
input samples for looking again (not a problem, we can get the same
from a render_memblockq), or a buffer for incompletely-accumulated
output samples (a serious problem).

The list above is a bit oversimplified and assumes that we kill
everything except the ffmpeg resampler, or except any other
essentially-stateless resampler. It may be indeed a good idea to
either extend the ffmpeg resampler for variable rates and kill all
others, or adapt the resampler implementation from the Wine project -
it is written by me, GPLed (but I would be happy to relicense if
needed), stateless in the same sense as ffmpeg's, and supports
variable sample rate.

-- 
Alexander E. Patrakov