[pulseaudio-discuss] Resampler quality evaluation results

Tue Sep 2 01:16:40 PDT 2014

On 2014-08-24 20:53, Alexander E. Patrakov wrote:
> I have finished the first stage of my work on resampler quality evaluation.
>
> The scripts are here: https://gitorious.org/psy-eval/psy-eval/
> The results are here: https://imgur.com/a/jtIEj
>
> Note: they are valid only for 44100 -> 48000 Hz resampling. But that's
> the common case.
>
> TL;DR summary: it makes sense to change the default resampler quality
> from the current "speex-float-1" value to "speex-float-3" or even
> "speex-float-5" on capable machines, otherwise the distortion is
> sometimes noticeable. And, speex-float-{3,5} are similar to what
> proprietary OSes offer.

Hi,

Indeed interesting work, but I have a few concerns to that conclusion...

> The work is based on the question: does a human listener notice the
> distortion introduced by a resampler? To answer that, I used a
> psychoacoustical model publicly available at the following URL:
>
> http://www.mp3-tech.org/programmer/docs/6_Heusdens.pdf

(cut)

> Under that definition, the plots that say "Limited bandwidth counts as
> distortion" below them were made. They display audibility of all
> distortions, as defined above, as a function of the input sine wave
> frequency, for a selection of resamplers. The sine wave is assumed to be
> at the full amplitude, which corresponds (as it is a common convention
> in psychoacoustical models) to 92 dB SPL. Note: do not listen at this
> volume. It is harmful. But it is also the worst case for the
> psychoacoustical model.

I'm trying to understand the diagrams here. It is based on a sine wave 
being played at 92 dB SPL, which is too high for the human ear. At that 
point, we get distortions of 15 dB (on average) for the trivial 
resampler, i e, the distortion or S/N is around -77 dB. Is this correct?

Now consider this:

1) The theoretical limit for the human ear is 0 dB. In practice, it is 
more around 10 - 20 dB.

2) As you say, 92 dB is too high for normal listening. Say 80 dB, which 
is still louder than one would typically listen to music for longer 
periods of time.

3) Now add to that the distortion of normal laptop speakers, headphones 
etc. It would be interesting to have that too in the diagram as a reference.

I e, the hearing range becomes 80 - 15 = 65 dB, and the trivial 
resampler's distortion is -77 dB.

So given your diagrams, you could just as well argue that one could 
switch to the trivial resampler, because you can't hear the distortion 
from it anyway. Now I'm not actually saying we should do that, just 
saying that maybe we shouldn't jump so quick to the conclusion that we 
need to switch to something with higher quality.

(Btw, maybe a log scale for frequency would have been more fair given 
how we perceive sounds?)

> Also, audibility of the distortions inherent in a TPDF-dithered 16-bit
> input is shown as "quantization noise" on the same plots. As you see,
> 16-bit input and TPDF dithering do not result in audible distortions.

I also see that speex-float-1 manages to have lower distortion than the 
16-bit dithering noise at some frequencies, is this an error in the diagram?

> It's quite sad that the current default in PulseAudio was influenced by
> the needs of low-power embedded devices at the measurable expense of the
> sound quality on the typical desktop. Now, with plots, figures and
> knowledge in hand, we can fix it.

Well, I'm not sure the typical desktop is that typical anymore. Laptops 
are more common than desktops, and phones are more common than laptops.
The average user might be more concerned about laptop battery life than 
to have resampling without artifacts, if those artifacts that cannot be 
heard anyway due to low quality laptop speakers.

So; your conclusion to switch to a higher quality resampler seems to 
have a few assumptions about the environment in terms of perfect ears, 
equipment, space, power supply and so on. The other extreme is a low-fi 
laptop speaker on battery, listened to by an ear with tinnitus, in a 
noisy room.

We'll need to end up with a compromise between these two extremes, maybe 
somewhere around our current default of speex-float-1, which nobody or 
very few people have complaints about (and those who have, are those who 
are interested in tuning their system to the highest quality, which 
could include switching our default resampler).

-- 
David Henningsson, Canonical Ltd.
https://launchpad.net/~diwic