[pulseaudio-discuss] Resampler quality evaluation results

Tue Sep 2 03:44:30 PDT 2014

02.09.2014 14:16, David Henningsson wrote:
>
>
> On 2014-08-24 20:53, Alexander E. Patrakov wrote:
>> I have finished the first stage of my work on resampler quality
>> evaluation.
>>
>> The scripts are here: https://gitorious.org/psy-eval/psy-eval/
>> The results are here: https://imgur.com/a/jtIEj
>>
>> Note: they are valid only for 44100 -> 48000 Hz resampling. But that's
>> the common case.
>>
>> TL;DR summary: it makes sense to change the default resampler quality
>> from the current "speex-float-1" value to "speex-float-3" or even
>> "speex-float-5" on capable machines, otherwise the distortion is
>> sometimes noticeable. And, speex-float-{3,5} are similar to what
>> proprietary OSes offer.
>
> Hi,
>
> Indeed interesting work, but I have a few concerns to that conclusion...
>
>> The work is based on the question: does a human listener notice the
>> distortion introduced by a resampler? To answer that, I used a
>> psychoacoustical model publicly available at the following URL:
>>
>> http://www.mp3-tech.org/programmer/docs/6_Heusdens.pdf
>
> (cut)
>
>> Under that definition, the plots that say "Limited bandwidth counts as
>> distortion" below them were made. They display audibility of all
>> distortions, as defined above, as a function of the input sine wave
>> frequency, for a selection of resamplers. The sine wave is assumed to be
>> at the full amplitude, which corresponds (as it is a common convention
>> in psychoacoustical models) to 92 dB SPL. Note: do not listen at this
>> volume. It is harmful. But it is also the worst case for the
>> psychoacoustical model.
>
> I'm trying to understand the diagrams here. It is based on a sine wave
> being played at 92 dB SPL, which is too high for the human ear. At that
> point, we get distortions of 15 dB (on average) for the trivial
> resampler, i e, the distortion or S/N is around -77 dB. Is this correct?

No. You are talking about signal-to-noise ratio (here noise == 
distortion). My point is exactly that it is irrelevant. We should talk 
about noise-to-mask ratio (and that's what's plotted), where "mask" is 
defined both by the absolute threshold of hearing _and_ by the existing 
sounds.

>
> Now consider this:
>
> 1) The theoretical limit for the human ear is 0 dB. In practice, it is
> more around 10 - 20 dB.
>
> 2) As you say, 92 dB is too high for normal listening. Say 80 dB, which
> is still louder than one would typically listen to music for longer
> periods of time.
>
> 3) Now add to that the distortion of normal laptop speakers, headphones
> etc. It would be interesting to have that too in the diagram as a
> reference.
>
> I e, the hearing range becomes 80 - 15 = 65 dB, and the trivial
> resampler's distortion is -77 dB.
>
> So given your diagrams, you could just as well argue that one could
> switch to the trivial resampler, because you can't hear the distortion
> from it anyway. Now I'm not actually saying we should do that, just
> saying that maybe we shouldn't jump so quick to the conclusion that we
> need to switch to something with higher quality.

The above is based on the incorrect understanding of what's plotted. If 
the plot goes above 0 dB, then the distortion is audible. It is 
important that "audible" here means "audible given the presence of the 
existing sound". So, the trivial resampler introduces distortions that 
are 18 dB higher that the minimal detectable ones.

>
> (Btw, maybe a log scale for frequency would have been more fair given
> how we perceive sounds?)
>
>> Also, audibility of the distortions inherent in a TPDF-dithered 16-bit
>> input is shown as "quantization noise" on the same plots. As you see,
>> 16-bit input and TPDF dithering do not result in audible distortions.
>
> I also see that speex-float-1 manages to have lower distortion than the
> 16-bit dithering noise at some frequencies, is this an error in the
> diagram?

You are reading it upside down. The higher the line is, the worse. And 
the only relevant threshold is 0dB, because everything is plotted 
against the threshold of detectability given the existing 
very-high-level signal. So, the thick black plot demonstrates that 
16-bit dithering never introduces audible distortions.

>
>> It's quite sad that the current default in PulseAudio was influenced by
>> the needs of low-power embedded devices at the measurable expense of the
>> sound quality on the typical desktop. Now, with plots, figures and
>> knowledge in hand, we can fix it.
>
> Well, I'm not sure the typical desktop is that typical anymore. Laptops
> are more common than desktops, and phones are more common than laptops.
> The average user might be more concerned about laptop battery life than
> to have resampling without artifacts, if those artifacts that cannot be
> heard anyway due to low quality laptop speakers.
>
> So; your conclusion to switch to a higher quality resampler seems to
> have a few assumptions about the environment in terms of perfect ears,
> equipment, space, power supply and so on. The other extreme is a low-fi
> laptop speaker on battery, listened to by an ear with tinnitus, in a
> noisy room.
>
> We'll need to end up with a compromise between these two extremes, maybe
> somewhere around our current default of speex-float-1, which nobody or
> very few people have complaints about (and those who have, are those who
> are interested in tuning their system to the highest quality, which
> could include switching our default resampler).

Then this should be documented, because both Windows (when measured 
properly) and Mac OS X (which is very popular on Apple laptops and 
definitely cares about battery life), with the default settings, under 
the second definition of distortions, never produce audible distortions 
even on artificial testcases. speex-float-3 does, but I hope (but don't 
promise) to indulge it by testing the model against real music and not 
just test tones.

-- 
Alexander E. Patrakov