[pulseaudio-discuss] Resampler quality evaluation: now with room noise!

Thu Sep 25 12:54:06 PDT 2014

[tl;dr: this is still a synthetic test that, unlike the previous one, 
you have to perform yourself if you want to see any results]

Previously, I have posted some quality-evaluation results for resamplers 
that can be used by PulseAudio, and compared them to the resamplers used 
in Windows 8.1 and Mac OS X:

http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-August/021362.html

The conclusion of that work was that we need to use speex-float-5 to 
match the metric of "never introducing audible distortions" (that other 
operating systems meet by default) when resampling from 44.1 to 48 kHz. 
However, David Henningsson argued that this "never" included a lot of 
unrealistic worst-case conditions, i.e. that the quality achieved in 
proprietary OSes is actually overkill. The worst-case conditions include:

1. Unbearably loud (92 dB SPL) sound from speakers or headphones. People 
don't listen at such levels. At lower levels, the distortions also have 
lower sound pressure, and may become unnoticeable.
2. Absolutely quiet room (except for this sound and resampler 
distortions). In a noisy room, noise can mask ("outvoice") the distortion.
3. Perfect speakers or headphones that don't distort sounds at all by 
themselves. Maybe headphone distortions can mask resampler distortions?
4. Sine wave (and not music or speech) as a test sound to be distorted 
by a resampler. Maybe other frequency components can mask resampler 
distortions?

This email deals with the first two objections. I plan to take (4) into 
account later (and won't consider the result worth any salt until I do 
that), but I can't take (3) and (4) into account simultaneously due to 
the lack of required theoretical knowledge.

Taking only (3) into account is meaningless, because a trivial solution 
exists. Namely, if a resampler's distortion of a pure tone with the 
frequency above 10 kHz is audible on ideal speakers (even in a noisy 
room), then it is also audible in the same room on arbitrary crappy 
speakers that reproduce the _distortion_ with the correct amplitude and 
don't amplify the signal too much. Indeed, crappy speakers (unlike 
resamplers), when fed with a sine wave, only produce harmonics as 
distortions. In the case of a sine wave with frequency greater than 10 
kHz, all such harmonic distortions are ultrasound, which is inaudible 
and cannot mask resampler-introduced distortions.

As there are no two rooms with the same noise [see 
http://stevetarzia.com/localization.php], and because people don't agree 
on the proper sound pressure level from the playback equipment, the 
proposal is for you, the reader, to get the results for your listening 
equipment and your room, using my scripts.

Spoiler: if you listen to music at such volume that full scale 
corresponds to 60 dB SPL, and your room noise is 35 dBA, you may find 
that speex-float-0 is adequate. On my Sony VAIO Z23A4R laptop and its 
built-in speakers, in my room, speex-float-1 does produce audible 
distortions on full-scale high-frequency sine waves (where only the 
distortion is audible, and not the original signal), and I have verified 
it with a direct test.

git clone git://gitorious.org/psy-eval/psy-eval.git

You will also need python2.7, numpy, scipy and matplotlib.

Also you need, as a 16-bit uncompressed wav file, a recording of your 
room noise with a high-quality condenser microphone and sound card with 
known sensitivity (so that you can get the sound pressure in physical 
units from the samples). Alternatively, you can use an uncalibrated 
recording paired with a noise meter reading on its "A" setting (so that 
the result is in dBA).

High quality is needed so that the scripts see the actual room noise and 
not microphone/soundcard self-noise, especially at high frequencies. If 
you don't have that, please use the room noise recording provided by 
David Henningsson (see the end of this email).

So, here is a procedure to determine mathematically whether a resampler 
produces distortions audible in your room on sine wave signals of your 
typical listening volume. I realize that the steps starting from 2 can 
be short-circuited by playing back test.wav (with both the default and 
alternate rate set to a non-matching value in /etc/pulse/daemon.conf) 
and listening for additional weak tones of obviously-wrong frequency. 
Please treat that shortcut as model validation. We still need a model so 
that we can judge new resamplers for you without ever needing your ears 
or playback equipment again.

1. Generate a linear-frequency-sweep signal:

[for the 44.1 -> 48 kHz case, you can find pre-generated resampled files 
via the link at the end of this email and skip directly to step 3 or even 5]

./wavegen.py --rate 44100 --length 1048576 --amplitude 0.9 --format s16 
--padding 131072 test.wav

--rate: the sample rate you want to resample from

--length: the length of the useful portion of the file, in samples. The 
half of the FFT size squared (i.e. 524288 for the FFT size of 1024) is 
the bare minimum which may produce unreliable results, especially when 
downsampling. The other script autodetects the rate at which the 
frequency changes, so the end result should be the same if you produce a 
longer file.

--amplitude: the amplitude of the wave, with 1.0 being the full scale. 
Keep it slightly lower, as some resamplers overamplify certain 
frequencies a little. The other script autodetects the amplitude, so the 
result should be the same.

--format: s16 or float. As the quantization noise is inaudible at 16 
bits, this doesn't really matter.

--padding: adds some silence before and after the useful portion of the 
wav file. The analysis script automatically cuts it out, provided that 
there are no clicks before the leading silence in the recording.

test.wav: the script will save the signal there.

2. Resample the test signal.

A slow but easy way to do this involves a null sink and its monitor source.

First, set the needed resample method in /etc/pulse/daemon.conf and 
restart PulseAudio.
Then, load the null sink with the rate you want to resample to, play the 
test signal through it and record the result using its monitor.

pacmd load-module module-null-sink rate=48000
parec -d null.monitor --fix-rate --rate=48000 --file-format=wav 
resampled.wav & paplay -d null test.wav ; killall parec

3. Get a recording of your room noise, as a 16-bit uncompressed wav 
file. 5-10 seconds are enough. Stereo recordings are OK, in this case 
the script will only use the left channel. The analysis script is smart 
enough to ignore short bursts of unwanted sound (e.g. clock ticks).

As already said, you will need either the dB SPL number corresponding to 
the full scale of the recording, or a dBA reading of the noise meter. If 
you don't have a noise meter, assume 35 dBA.

4. Get a measurement of sound pressure level corresponding to the full 
scale at your preferred volume. There are two ways to do this: with a 
hardware noise meter or with a calibrated microphone. In both cases, you 
will need a 1 kHz test file.

Here is how to make a 1 kHz test file:

./wavegen.py --rate 44100 --length 1000000 --amplitude 1.0 
--constant-freq 1000 1000Hz.wav

Play this file back, and either take the noise meter dBA reading (which 
at 1000 Hz is the same as dB SPL), or record the sound using a 
microphone and sound card with a known sensitivity. In the second case, 
make sure that the sine wave occupies at least 90% of the recording 
duration (i.e. that there is not too much leading or trailing silence), 
and use the software noise meter:

./noise.py --noise-full-scale 84 --sine recorded-1000Hz.wav

where --noise-full-scale is the dB SPL value corresponding to the 
full-scale recorded signal. You can get it if you know the sensitivity 
of your microphone and the sound card.

--sine turns off the median-vs-mean adjustment logic that is invalid for 
stationary pure tones.

Note: noise.py intentionally does not implement the standard peak-decay 
function, because that would interfere with ignoring the clock. So the 
results are valid for stationary noise or stationary signal only.

5. Make some plots, here is how:

./resampler_plots.py --rate-from 44100 --skip 32768 --save newplot 
--fftsize 1024 --noise-file noise.wav --noise-full-scale 84 resampled.wav

--rate-from: the sample rate of the original file (test.wav)

--skip: skip this many samples from the beginning. This is needed with 
some versions of PulseAudio because they add a click at the beginning of 
a recording.

--save: says how you want to name the plots. In the example, you'll get 
newplot_*.png for various values of "*".

--fftsize: the FFT size. Meaningful values are between 1024 and 8192, 
inclusive. Big FFTs need longer test files, the dependency is quadratic.

--noise-file: a file with the recording of your room noise. If you have 
an absolutely quiet room, don't specify this parameter.

--noise-full-scale: if you recorded room noise with a calibrated 
microphone and sound card, then you know the dB SPL value corresponding 
to a full-scale sine wave. Put it here.

--noise-dba: if you have a noise meter instead, put its reading (with 
the "A" setting) here. If you have nether a calibrated microphone nor a 
noise meter, put 35 here.

This will produce some plots. On all plots, dB means relative to the 
"standard" full scale used in the earlier versions of psy-eval, i.e. 0 
dB on any plot means 92 dB SPL.

newplot_response.png: a spectrogram showing the response of the 
resampler to sine waves of the full amplitude. On the X axis, there will 
be the input frequency. The amplitude of each output frequency component 
is then described by the color at the height corresponding to the output 
frequency. Ideally, there should be only one frequency, equal to that of 
the input (see the bright diagonal line), but actually there are 
distortions.

newplot_envelope.png: shows the amplitude of output signal vs frequency 
if the input signal contains only this frequency at the full scale.

newplot_response_plus_noise.png: a spectrogram showing the response of 
the resampler to sine waves at the target listening volume, plus room 
noise. Handy to visualize what's hidden by noise and what isn't.

newplot_distortion.png, newplot_distortion_eq.png: the same spectrogram 
as newplot_response.png, with some areas blacked out, so that only 
distortions remain. Without the _eq, attenuating the main tone counts as 
a distortion. With _eq, attenuating the main tone does not count as a 
distortion.

newplot_audibility.png, newplot_audibility_eq.png: these plots show 
whether a human can detect the resampler distortion in the presence of 
the main signal of the specified frequency (X axis) at the preferred 
volume and the room noise. If the result is higher than 0 dB, a human 
will notice the distortion given the chance to compare the 
(possibly-equalized) correct and the distorted sounds. If it is lower, 
then the distortion is not noticeable. In both cases, the absolute value 
plotted tells how much the distortion needs to be changed in order to 
become just-noticeable.

For those who just want to see the plots for speex at various listening 
volumes but don't want to waste time with the null sink, here is an 
archive with the results of 44.1 -> 48 kHz resampling and a noise file 
(recorded by David Henningsson) that can be scaled to an arbitrary dBA 
reading:

https://yadi.sk/d/RzV7JGAxbfUve (a zip archive with flac files and a README)

Note: the provided resampling results are usable with FFT sizes from 
1024 up to 4096. The provided noise file was recorded on equipment with 
known sensitivity, the full scale is known to be 84 dB SPL, and the 
noise level is thus 35 dBA. To pretend that you have more or less than 
35 dBA of noise in your room, use the --noise-dba option.

-- 
Alexander E. Patrakov