[pulseaudio-discuss] Resampler quality evaluation: now on music files

Sat Oct 4 22:48:20 PDT 2014

[tl;dr: speex-float-1 is adequate for 44100 -> 48000 Hz resampling, 
ffmpeg also is, speex-float-0 isn't]

Previously, I have posted some quality-evaluation results for resamplers 
that can be used by PulseAudio:

http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-August/021362.html

http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-September/021811.html

The main objections were:

1. Unbearably loud (92 dB SPL) sound from speakers or headphones. People 
don't listen at such levels. At lower levels, the distortions also have 
lower sound pressure, and may become unnoticeable.

2. Absolutely quiet room (except for this sound and resampler 
distortions). In a noisy room, noise can mask ("outvoice") the distortion.

3. Perfect speakers or headphones that don't distort sounds at all by 
themselves. Maybe headphone distortions can mask resampler distortions?

4. Sine wave (and not music or speech) as a test sound to be distorted 
by a resampler. Maybe other frequency components can mask resampler 
distortions?

As it turns out, (4) is a very valid point. The most valid point of all 
four. In fact, in the vast majority of music files, the extra components 
of the signal are strong enough to mask the distortions of speex-float-1 
even without taking other points into account. Still, I have a script 
that takes (1), (2) and (4) into account, and you can run it on your own 
music files. As I already explained in the previous email, there is no 
plan to account for (3).

git clone git://gitorious.org/psy-eval/psy-eval.git

You will need python2.7, numpy, scipy, matplotlib, and also ffmpeg (or 
possibly libav).

You also need a wav file with resampler response, and, optionally, a 
recording of room noise, also as a 16-bit uncompressed wav file. See 
http://lists.freedesktop.org/archives/pulseaudio-discuss/2014-September/021811.html 
how to obtain these wav files, or use pre-generated ones:

https://yadi.sk/d/RzV7JGAxbfUve (the same archive as used in the 
previous email)

So, the new script is ./music_distortions.py , and it takes the 
following arguments:

--resampler-response: the wav file with resampler response to a linear 
frequency sweep. You can use "speex-float-1.wav".

--rate-from: the sample rate that the sine sweep was resampled from. For 
files in my archive, that's 44100.

--skip: if the resampler response contains junk in the beginning, use 
this to skip a specified number of samples.

--fftsize: the FFT size, at the target sample rate. Useful values are 
1024 - 8192.

--noise-file: wav file with room noise. Optional.

--noise-full-scale: if you recorded room noise with a calibrated 
microphone and sound card, then you know the dB SPL value corresponding 
to a full-scale sine wave. Put it here. The default is 92, but you need 
84 in order to use the noise file from the archive.

--noise-dba: if you have a noise meter instead, put its reading (with 
the "A" setting) here. If you have nether a calibrated microphone nor a 
noise meter, but want to use your own noise file, put 35 here.

--signal-full-scale: If you know the sound pressure level corresponding 
to the full-scale sine wave at your soundcard output, put it here, in 
dB. The default is 92.

--use-eq: Use this switch to ignore the fact that resampler attenuates 
high frequencies (with the implication that a human can notice this 
distortion if he/she knows that they should be there).

--save: if you want to save the plot, put a prefix of its name here. 
_audibility_vs_time.png will be appended. If you don't specify this, the 
plot will be shown instead.

--report-only: don't plot anything, just report the average distortion, 
the maximum distortion, and where it happens.

Finally, specify the music file name. That file should be in any format 
supported by ffmpeg, and should have the same sample rate as --rate-from 
says. Only the front left channel will be taken into account.

E.g.:

./music_distortions.py --signal-full-scale 72 --fftsize 1024 
--resampler-response speex-float-0.wav --rate-from 44100 --skip 65536 
--save Prelude Prelude.wav

produces (together with some warnings):

"Prelude.wav", average distortion = -8.8 dB, maximum = -2.2 dB, at 4:33

and the attached plot. If the curve is below 0 dB, an average human 
cannot notice the distortions. If it is above, then the distortion can 
be noticed, provided that the subject knows how the file should sound 
with the ideal resampler.

I do have some music files where the script at its default settings 
finds speex-float-1 marginally adequate (i.e. maximum audibility of 
distortions is close to 0 dB), or even not adequate with non-default FFT 
size (2048 or 4096) [*]. In all such (rare) cases, --signal-full-scale 
72 removes the complaint. Probably that's because the complaint is 
really about some nearly-ultrasonic frequency component that got 
rejected by the resampler in the first case and sank below the absolute 
threshold of hearing when the volume was reduced in the second case.

For those who want to test, here are the affected New Age albums:

Ryan Farish - Everlasting
Australis - The Gates of Reality
Daveed - Songs From Beyond

Interestingly, the "average" figure is worse on speech material (such as 
foreign language courses) than on music.

[*] The FFT size dependency is, strictly speaking, a bug. This is 
probably related to the use of a narrow (low-noise) window without 
sufficient overlap, so the bad fragment just slips through the gap 
between the two neighboring positions of the 1024-sample window. Still, 
the average figure is stable when changing the FFT size.

P.S. Tomorrow I have a flight to France (due to XDC 2014), so I won't be 
able to answer your questions quickly.

-- 
Alexander E. Patrakov
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Prelude_audibility_vs_time.png
Type: image/png
Size: 51902 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/pulseaudio-discuss/attachments/20141005/5fb37801/attachment-0001.png>