[pulseaudio-discuss] Crackling audio with Pulseaudio 4.0 and the simple Pulse API.

Thu Jul 11 08:54:53 PDT 2013

2013/7/11 Tanu Kaskinen <tanu.kaskinen at linux.intel.com>:
> You don't understand me, and I don't understand you... What is the
> result of the boundary being off? Does audio get skipped, duplicated, or
> is there no error at all?

No duration-related error at all in my understanding, see below where
it differs from yours.

> I'll try to clarify this (also to myself) with an example. Let's have a
> sink, and a stream with a resampler in between. For simplicity, let's
> assume that the resampler doesn't actually do any resampling, so when
> the sink asks for 10 samples, the resampler reads from the stream 10
> samples.
>
> Let's say that the write index of the sink is N, and the resampler has
> one sample buffered. Due to the buffered sample, the read index of the
> stream is N+1.

Let me rephrase this in order to check that I understand. You are
talking about a resampler that has 1:1 input:output sample rate ratio.
The resampler needs to look by one sample ahead (or behind, depending
on how you look at it) in order to function. A simple example of such
"resampler" would be something that averages each incoming sample with
the previous one.

You have pushed 11 samples into the resampler. You say that the
resampler has consumed one sample for its internal buffer, consumed 10
more samples "for good output" and produced 10 output samples. And
this is the point I don't quite agree with.

What you describe is one possible behaviour (and I'd say a buggy one,
but it does exist), but we need to consider one more possibility. The
other case is that the resampler produces 11 samples when fed 11
samples. The first sample in my example is the average of zero and the
first input sample, and it's technically wrong to throw it out,
because this would mean a change in the zero-extended output from
prepending an all-zero sequence to the input. I.e. resamplers should
by default, at the beginning of the stream, treat internal buffers not
as empty, but as full, pre-filled with zeros. See also the note at the
end of this mail.

But let's say that, depending on the implementation, the read index
might be either N or N+1.

> Now the sink is rewound by 10 samples. This means that the sink will
> want the next written sample to be from index N-10. The resampler drops
> the buffered sample, and the read index of the stream moves back by 10
> samples to N-9. The sample at N-10 got lost, the user hears audio
> skipping by one sample.

"the read index of the stream moves back by 10 samples to N-9" is of
course wrong, as you point out below.

> The amount of dropped audio in the resampler buffer should have been
> added to the amount by which the stream read index was moved back.

Sure. The confusion actually comes from your attempt to second-guess
behind the resampler's back which input samples it wants due to a
rewind. Actually, for non-1:1 resamplers, the amount of buffering is
variable in time. E.g. consider a 3:2 downsampler that works by linear
interpolation:

Y[0] = X[0]
Y[1] = (X[1] + X[2]) / 2
Y[2] = X[3]
Y[3] = (X[4] + X[5]) / 2
Y[4] = X[6]
Y[5] = (X[7] + X[8]) / 2
and so on

Sometimes, when fed a sample, it can copy the sample to the output,
and sometimes it will average two neighbouring samples. Sure you can
second-guess after this particular resampling pattern, but now
consider another valid case of a linear-interpolation 3:2 downsampler:

Y[0] = (3 * X[0] + X[1]) / 4
Y[1] = (X[1] + 3 * X[2]) / 4
Y[2] = (3 * X[3] + X[4]) / 4
Y[3] = (X[4] + 3 * X[5]) / 4
Y[4] = (3 * X[6] + X[7]) / 4
Y[5] = (X[7] + 3 * X[8]) / 4
and so on

which always has to look ahead. Same maximum "buffer" length, same
resample ratio, different input needs. E.g., in the first case, you
have to know the input up to X[3] to determine Y[2], while in the
second case you need to know one more input sample.

So let me repeat - don't attempt to guess which input samples the
resampler will need after a sink rewind, you always will be wrong. Let
the resampler implementation decide (i.e. you just have to implement a
"pull" model instead of "push" if you allow arbitrary sink-based
rewinds that need a rerun of the resampler), this naturally leads to
the need to forward all rewind requests to the particular
implementation. But see below for an alternative that you have
correctly suggested, based on snapshots.

> "The initial phase" means the last output sample relative to the new
> position, right?

Yes. In the above 3:2 examples, it means whether an even or an odd
output sample is produced. And if you look carefully, you will notice
that the two examples above differ only by the shift by 1/4 input
sample, so that can also be counted as a phase difference.

> It might be feasible to add the required functionality to the stock
> resamplers. When we discussed the filter rewinding, you mentioned the
> idea of maintaining a history of filter state snapshots. Taking a
> snapshot only requires a function for copying the filter state, and I
> would guess that adding such function to the stock resamplers could very
> well be done.

True. It would also be convenient to store the read index of the
stream and the write index of the sink along with the snapshots. This
way, you just search for the latest snapshot that happened before the
piece of the history that you want to amend, and continue from there,
and this works for arbitrary buffering requirements of the resampler.

-- 
Alexander E. Patrakov