[pulseaudio-discuss] Resampler quality evaluation results

Laurențiu Nicola lnicola at dend.ro
Tue Sep 16 04:36:24 PDT 2014


Thanks a lot. For the record, it seems that:

    1. Resampling from closer rates yields less distortion that from
    rates that are far apart.
    2. Upsampling distorts more than downsampling.
    3. speex-float-3 gives audible distortion even for close rates.

I suppose I these results were to be expected, but it's still nice to
have confirmation.

Now if I only could check the performance of speex-fixed vs. speex-float
on my platform.. :).

Laurentiu Nicola

On Sun, Sep 14, 2014, at 16:34, Alexander E. Patrakov wrote:
> 07.09.2014 17:04, Laurențiu Nicola wrote:
> > Great, thanks!
> >
> > On Sun, Sep 7, 2014, at 14:02, Alexander E. Patrakov wrote:
> >> 07.09.2014 16:58, Laurențiu Nicola wrote:
> >>> I have a question related to your tests. In my application, I need
> >>> resampling between close rates (let's say from 44200 to 44100). Do you
> >>> feel that the results would basically be the same in this kind of
> >>> situation?
> >>>
> >>> Thanks,
> >>> Laurentiu Nicola
> >>
> >> I have not tested. I will write instructions on the next week, so that
> >> you can test yourself every situation that you want to.
> 
> Here are the instructions. Sorry for the delay.
> 
> 1. Choose a sample rate that you will be resampling from. In your case, 
> this is 44200 Hz. Generate a wav file with this sample rate, containing 
> a linear sweep:
> 
> ./wavegen.py --rate 44200 --length $(( 1024 * 1024 )) --amplitude 0.99 
> --format s16 --padding 65536 44200.wav
> 
> The length should be sufficient so that for every frequency bin of the 
> FFT on the next steps the file contains a piece of sufficient-for-FFT 
> length (or ideally several such pieces) with only that frequency. The 
> amplitude should be 0.99 to avoid accidental clipping by the resampler. 
> The padding is unfortunately needed because the recorded wav file from 
> the resampler on the next step contains unwanted clicks for some unknown 
> reason.
> 
> 2. Resample the file. An easy but slow way is to use a null sink running 
> at the rate you want to resample to. Here are the commands:
> 
> pacmd load-module module-null-sink rate=44100
> 
> parec -d null.monitor --fix-rate --file-format=wav 44200_to_44100.wav & 
> paplay -d null 44200.wav ; killall parec
> 
> Hopefully these commands are obvious.
> 
> 3. Analyze the result.
> 
> ./resampler_plots.py --rate-from 44200 --skip 32768 --save plop 
> --fftsize 1024 44200_to_44100.wav
> 
> There may be warnings about dropouts. If they are near the end, that's 
> OK. Also there will be warnings about division by zero, that's because 
> of the masked-out frequency components. Ignore them.
> 
> The meaning of parameters: rate-from is the sample rate of the original 
> wav file that contained the linear sweep, skip means "skip this number 
> of samples from the beginning" (because there is a click). After 
> skipping, the analysis process skips further through the silent portion 
> of the resampled file and automatically adapts to any unknown slope of 
> frequency change.
> 
> The "fftsize" parameter, well, sets the FFT size. Useful values start 
> from 1024. Below that, the resolution in the low-frequency part of the 
> spectrum is not sufficient to determine audibility reliably, because the 
> absolute threshold of hearing changes significantly within one frequency 
> bin. The FFT size is specified in terms of the frequency bins. The 
> required number of input samples for each signal piece is twice more, 
> i.e. 2048 in our case.
> 
> The "save" parameter sets a base of all output filenames. So, you'll get:
> 
> plop_envelope.png: shows the amplitude of output signal vs frequency if 
> the input signal contains only this frequency at the full scale.
> 
> plop_response.png: a spectrogram. To read it, select an input frequency. 
> Then cut a column out of this spectrogram according ot the X axis. The 
> amplitude of each output frequency component is then described by the 
> color of the column at the height corresponding to the output frequency. 
> E.g., it can be seen that, when given a 5 kHz input signal, the 
> src-sinc-fastest resampler also produces some very weak unwanted output 
> at 18 kHz.
> 
> plop_distortion_eq.png: the same, but with the line representing the 
> wanted same-frequency output suppressed. I.e. only distortions, with the 
> assumption that wanted-signal attenuation does not count as a distortion.
> 
> plop_distortion.png: the same, but with the same line replaced by the 
> difference of wanted vs actual same-frequency output. I.e. only 
> distortions, and the attenuation of the signal now counts as a
> distortion.
> 
> plop_audibility_eq.png: audibility of distortions (i.e. how much should 
> one reduce the distortion before it becomes inaudible) given the input 
> signal containing only this frequency at the maximum amplitude, if 
> attenuation of high signal frequencies does not count as a distortion.
> 
> plop_audibility.png: the same, but now such attenuation counts as a 
> distortion.
> 
> The results are valid only in the absolutely quiet room, i.e. represent 
> the worst case.
> 
> One of the plots from src-sinc-fastest is attached for you to compare.
> 
> -- 
> Alexander E. Patrakov
> Email had 1 attachment:
> + plop_audibility_eq.png
>   65k (image/png)


More information about the pulseaudio-discuss mailing list