[pulseaudio-discuss] Resampler quality evaluation: now with room noise!
Alexander E. Patrakov
patrakov at gmail.com
Thu Sep 25 12:54:06 PDT 2014
[tl;dr: this is still a synthetic test that, unlike the previous one,
you have to perform yourself if you want to see any results]
Previously, I have posted some quality-evaluation results for resamplers
that can be used by PulseAudio, and compared them to the resamplers used
in Windows 8.1 and Mac OS X:
http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-August/021362.html
The conclusion of that work was that we need to use speex-float-5 to
match the metric of "never introducing audible distortions" (that other
operating systems meet by default) when resampling from 44.1 to 48 kHz.
However, David Henningsson argued that this "never" included a lot of
unrealistic worst-case conditions, i.e. that the quality achieved in
proprietary OSes is actually overkill. The worst-case conditions include:
1. Unbearably loud (92 dB SPL) sound from speakers or headphones. People
don't listen at such levels. At lower levels, the distortions also have
lower sound pressure, and may become unnoticeable.
2. Absolutely quiet room (except for this sound and resampler
distortions). In a noisy room, noise can mask ("outvoice") the distortion.
3. Perfect speakers or headphones that don't distort sounds at all by
themselves. Maybe headphone distortions can mask resampler distortions?
4. Sine wave (and not music or speech) as a test sound to be distorted
by a resampler. Maybe other frequency components can mask resampler
distortions?
This email deals with the first two objections. I plan to take (4) into
account later (and won't consider the result worth any salt until I do
that), but I can't take (3) and (4) into account simultaneously due to
the lack of required theoretical knowledge.
Taking only (3) into account is meaningless, because a trivial solution
exists. Namely, if a resampler's distortion of a pure tone with the
frequency above 10 kHz is audible on ideal speakers (even in a noisy
room), then it is also audible in the same room on arbitrary crappy
speakers that reproduce the _distortion_ with the correct amplitude and
don't amplify the signal too much. Indeed, crappy speakers (unlike
resamplers), when fed with a sine wave, only produce harmonics as
distortions. In the case of a sine wave with frequency greater than 10
kHz, all such harmonic distortions are ultrasound, which is inaudible
and cannot mask resampler-introduced distortions.
As there are no two rooms with the same noise [see
http://stevetarzia.com/localization.php], and because people don't agree
on the proper sound pressure level from the playback equipment, the
proposal is for you, the reader, to get the results for your listening
equipment and your room, using my scripts.
Spoiler: if you listen to music at such volume that full scale
corresponds to 60 dB SPL, and your room noise is 35 dBA, you may find
that speex-float-0 is adequate. On my Sony VAIO Z23A4R laptop and its
built-in speakers, in my room, speex-float-1 does produce audible
distortions on full-scale high-frequency sine waves (where only the
distortion is audible, and not the original signal), and I have verified
it with a direct test.
git clone git://gitorious.org/psy-eval/psy-eval.git
You will also need python2.7, numpy, scipy and matplotlib.
Also you need, as a 16-bit uncompressed wav file, a recording of your
room noise with a high-quality condenser microphone and sound card with
known sensitivity (so that you can get the sound pressure in physical
units from the samples). Alternatively, you can use an uncalibrated
recording paired with a noise meter reading on its "A" setting (so that
the result is in dBA).
High quality is needed so that the scripts see the actual room noise and
not microphone/soundcard self-noise, especially at high frequencies. If
you don't have that, please use the room noise recording provided by
David Henningsson (see the end of this email).
So, here is a procedure to determine mathematically whether a resampler
produces distortions audible in your room on sine wave signals of your
typical listening volume. I realize that the steps starting from 2 can
be short-circuited by playing back test.wav (with both the default and
alternate rate set to a non-matching value in /etc/pulse/daemon.conf)
and listening for additional weak tones of obviously-wrong frequency.
Please treat that shortcut as model validation. We still need a model so
that we can judge new resamplers for you without ever needing your ears
or playback equipment again.
1. Generate a linear-frequency-sweep signal:
[for the 44.1 -> 48 kHz case, you can find pre-generated resampled files
via the link at the end of this email and skip directly to step 3 or even 5]
./wavegen.py --rate 44100 --length 1048576 --amplitude 0.9 --format s16
--padding 131072 test.wav
--rate: the sample rate you want to resample from
--length: the length of the useful portion of the file, in samples. The
half of the FFT size squared (i.e. 524288 for the FFT size of 1024) is
the bare minimum which may produce unreliable results, especially when
downsampling. The other script autodetects the rate at which the
frequency changes, so the end result should be the same if you produce a
longer file.
--amplitude: the amplitude of the wave, with 1.0 being the full scale.
Keep it slightly lower, as some resamplers overamplify certain
frequencies a little. The other script autodetects the amplitude, so the
result should be the same.
--format: s16 or float. As the quantization noise is inaudible at 16
bits, this doesn't really matter.
--padding: adds some silence before and after the useful portion of the
wav file. The analysis script automatically cuts it out, provided that
there are no clicks before the leading silence in the recording.
test.wav: the script will save the signal there.
2. Resample the test signal.
A slow but easy way to do this involves a null sink and its monitor source.
First, set the needed resample method in /etc/pulse/daemon.conf and
restart PulseAudio.
Then, load the null sink with the rate you want to resample to, play the
test signal through it and record the result using its monitor.
pacmd load-module module-null-sink rate=48000
parec -d null.monitor --fix-rate --rate=48000 --file-format=wav
resampled.wav & paplay -d null test.wav ; killall parec
3. Get a recording of your room noise, as a 16-bit uncompressed wav
file. 5-10 seconds are enough. Stereo recordings are OK, in this case
the script will only use the left channel. The analysis script is smart
enough to ignore short bursts of unwanted sound (e.g. clock ticks).
As already said, you will need either the dB SPL number corresponding to
the full scale of the recording, or a dBA reading of the noise meter. If
you don't have a noise meter, assume 35 dBA.
4. Get a measurement of sound pressure level corresponding to the full
scale at your preferred volume. There are two ways to do this: with a
hardware noise meter or with a calibrated microphone. In both cases, you
will need a 1 kHz test file.
Here is how to make a 1 kHz test file:
./wavegen.py --rate 44100 --length 1000000 --amplitude 1.0
--constant-freq 1000 1000Hz.wav
Play this file back, and either take the noise meter dBA reading (which
at 1000 Hz is the same as dB SPL), or record the sound using a
microphone and sound card with a known sensitivity. In the second case,
make sure that the sine wave occupies at least 90% of the recording
duration (i.e. that there is not too much leading or trailing silence),
and use the software noise meter:
./noise.py --noise-full-scale 84 --sine recorded-1000Hz.wav
where --noise-full-scale is the dB SPL value corresponding to the
full-scale recorded signal. You can get it if you know the sensitivity
of your microphone and the sound card.
--sine turns off the median-vs-mean adjustment logic that is invalid for
stationary pure tones.
Note: noise.py intentionally does not implement the standard peak-decay
function, because that would interfere with ignoring the clock. So the
results are valid for stationary noise or stationary signal only.
5. Make some plots, here is how:
./resampler_plots.py --rate-from 44100 --skip 32768 --save newplot
--fftsize 1024 --noise-file noise.wav --noise-full-scale 84 resampled.wav
--rate-from: the sample rate of the original file (test.wav)
--skip: skip this many samples from the beginning. This is needed with
some versions of PulseAudio because they add a click at the beginning of
a recording.
--save: says how you want to name the plots. In the example, you'll get
newplot_*.png for various values of "*".
--fftsize: the FFT size. Meaningful values are between 1024 and 8192,
inclusive. Big FFTs need longer test files, the dependency is quadratic.
--noise-file: a file with the recording of your room noise. If you have
an absolutely quiet room, don't specify this parameter.
--noise-full-scale: if you recorded room noise with a calibrated
microphone and sound card, then you know the dB SPL value corresponding
to a full-scale sine wave. Put it here.
--noise-dba: if you have a noise meter instead, put its reading (with
the "A" setting) here. If you have nether a calibrated microphone nor a
noise meter, put 35 here.
This will produce some plots. On all plots, dB means relative to the
"standard" full scale used in the earlier versions of psy-eval, i.e. 0
dB on any plot means 92 dB SPL.
newplot_response.png: a spectrogram showing the response of the
resampler to sine waves of the full amplitude. On the X axis, there will
be the input frequency. The amplitude of each output frequency component
is then described by the color at the height corresponding to the output
frequency. Ideally, there should be only one frequency, equal to that of
the input (see the bright diagonal line), but actually there are
distortions.
newplot_envelope.png: shows the amplitude of output signal vs frequency
if the input signal contains only this frequency at the full scale.
newplot_response_plus_noise.png: a spectrogram showing the response of
the resampler to sine waves at the target listening volume, plus room
noise. Handy to visualize what's hidden by noise and what isn't.
newplot_distortion.png, newplot_distortion_eq.png: the same spectrogram
as newplot_response.png, with some areas blacked out, so that only
distortions remain. Without the _eq, attenuating the main tone counts as
a distortion. With _eq, attenuating the main tone does not count as a
distortion.
newplot_audibility.png, newplot_audibility_eq.png: these plots show
whether a human can detect the resampler distortion in the presence of
the main signal of the specified frequency (X axis) at the preferred
volume and the room noise. If the result is higher than 0 dB, a human
will notice the distortion given the chance to compare the
(possibly-equalized) correct and the distorted sounds. If it is lower,
then the distortion is not noticeable. In both cases, the absolute value
plotted tells how much the distortion needs to be changed in order to
become just-noticeable.
For those who just want to see the plots for speex at various listening
volumes but don't want to waste time with the null sink, here is an
archive with the results of 44.1 -> 48 kHz resampling and a noise file
(recorded by David Henningsson) that can be scaled to an arbitrary dBA
reading:
https://yadi.sk/d/RzV7JGAxbfUve (a zip archive with flac files and a README)
Note: the provided resampling results are usable with FFT sizes from
1024 up to 4096. The provided noise file was recorded on equipment with
known sensitivity, the full scale is known to be 84 dB SPL, and the
noise level is thus 35 dBA. To pretend that you have more or less than
35 dBA of noise in your room, use the --noise-dba option.
--
Alexander E. Patrakov
More information about the pulseaudio-discuss
mailing list