[pulseaudio-discuss] GSoC 2014 call for ideas

Fri Feb 7 03:44:15 PST 2014

2014-02-07 10:17 GMT+06:00 Arun Raghavan <arun at accosted.net>:
> Hello,
> This year's call for projects participation is out, and I'd like to
> gauge interest in participation. I'm happy to run org admin duty
> again, and if you've got ideas for a project and/or would like to
> mentor a student, please drop your name on the wiki:
>
> http://www.freedesktop.org/wiki/Software/PulseAudio/Software/PulseAudio/GSoC2014/
>
> We should decide one way or the other by mid next week so that we can
> get our org application in well in time if we're doing this.

Hello.

The following is mostly a copy-paste from the ideas that I have
already sent to the list or privately to people, plus a direct
translation of some features provided by hardware. The text, of
course, needs to be improved before the final sumbission to Google. I
would be happy to review any related code.

1. Tool for objective automated noninteractive evaluation of the
percieved resampler quality.

Problem statement: in commit 92bb9fb8b5aeebb87c4df7416e75db1782e2dd3a,
the default resampler quality has been changed without any objective
arguments about the impact on the percieved sound quality. And there
is no tool to make such objective arguments, although there is enough
science to create it. It should be created.

The task, as I see it, is to:
a) implement a well-respected published psychoacoustical model, or
take an existing one;
b) quantify distortions (noise from rounding errors on intermediate
results, unwanted aliased frequency content, attenuated high
frequencies) introduced by the existing resamplers - i.e. write a
program that, given a sound file and the target sample rate, produces
the dB level of the distortion introduced by a given resampler in each
time interval at each frequency bin; bonus points for doing the same
for Windows and Mac OS X built-in resamplers (definitely doable by
capturing their impulse response through KVM; I did this for Windows
before writing the Wine resampler, but did not link it to any
psychoacoustical model);
c) given a variety of reql-world sound material (music of different
genres, soundtracks, talks) and a psychoacoustical model, calculate
the dB level of distortion that can be introduced in each time
interval in each frequency bin without the average human noticing
this;
d) compare the results from (b) and (c), make one of the conclusions:
"overkill", "just right", "introduces noticeable distortion in this
frequency band, here is the problematic sample".

<off-topic>I am quite surprised that there was no "audiophile"
discussion on the list or elsewhere, especially since the old default
filter length closely matched what Windows XP does by default (I can
state that as an author of the resampler used in Wine). But I can't
make any statements about whether the new default is good enough
without the mentioned tool.</off-topic>

Contacts: Alexander E. Patrakov

Necessary background: digital sound processing, access to scientific
papers on the topic, python with numpy and scipy, or any other
mathematical toolbox. If I were to do this, numpy/scipy would be my
toolbox of choice.

2. Rewind-friendly resampler.

Problem statement: As of now, when rewinding a sink input, PulseAudio
resets the resampler. This is wrong and leads to audible clicks, but
this is a necessary evil because none of the resampler libraries used
by PulseAudio has a rewind-compatible API (i.e. the existing APIs
don't allow to say "forget the last 1000 input samples, tell me how
many output samples should be forgotten due to that"). A new resampler
has to be written or an existing one improved to such a degree that
calling pa_stream_write() with the last two parameters other than 0,0
and overwriting the previously-written samples with themselves does
not introduce clicks. Just as well, if a sink processes a rewind for
internal reasons, there should be no clicks.

Contacts: Alexander E. Patrakov

Necessary background: digital sound processing, C

Note: a similar problem exists with virtual sink modules:
module-equalizer-sink and module-virtual-surround-sink. However, the
next two proposals invalidate a "fix virtual sinks" would-be-proposal,
as after them only essentially-realtime effects and module-ladspa-sink
remain.

3. Equalizer in pavucontrol (very questionable, see below)

Problem statement: As of now, the only graphical frontend to
module-equalizer-sink is qpaeq (PyQT4-based). A GTK-based based
frontend should be written and included into pavucontrol.

Contacts: Colin Guthrie?

Necessary background: C, GTK+, D-Bus

And here is why I think this is questionable. First, look at
module-equalizer-sink code. The impression is that it has been
accepted without any review. It just prenends to "work". E.g. a buffer
is allocated with fftwf_malloc() and freed with free() instead of
fftw_free(). The code is also wrong from the DSP viewpoint - e.g. it
does nothing to ensure that the impulse response is shorter than the
FFT size minus the window size, thus failing time invariance. If the
sink is used at the 16000 Hz sampling rate or less, there is a buffer
overflow due to inconsistent choice of the FFT length and the window
size. The algorithmic latency is fixed at 15999 samples, which is way
too much. The module does not use any benefits (e.g. the chance to
handle rewinds properly) of being a native PulseAudio module and not a
LADSPA plugin. Veromix (an advanced mixer application for PulseAudio)
already uses module-ladspa-sink instead of this, maybe due to the
unified D-BUS API provided by module-ladspa-sink that allows veromix
to control other LADSPA plugins as well. If I were you, I would have
deleted the module right now instead of proposing this GSoC project.
But then, "implement an equalizer in pavucontrol, using
module-ladspa-sink as a backend" would be valid.

4. Channel remixer improvements. (needs splitting, see even more ideas
in a big comment in resampler.c)

Problem statement: currently, PulseAudio has a remixer in its core
that only produces instantaneous linear combinations of the input
channels, and also module-virtual-surround-sink, that, given a wav
file with head-related impulse responses, downmixes 5.1 to stereo
while preserving spatial information. These two remixers have a bad
interaction between themselves and with profile switches, see below,
and this looks ugly-to-fix in the virtual-sink model. The goal is to
introduce advanced upmixing and downmixing techniques into PulseAudio
core.

Bad interaction: suppose that one plays a 4.0 track through
module-virtual-surround-sink. Module-virtual-surround-sink is a sink,
so PulseAudio applies its usual remixing to all input streams using
its core. Thus, module-virtual-surround-sink sees not the original 4.0
content that needs to be downmixed, but fake 5.1 content corrupted by
synthesizing the fake center and LFE channels. PA_RESAMPLER_NO_REMIX
would help here, but introduces another problem: with normalization.
Again, there is no way to distinguish this from a 5.1 stream with a
silent center channel. Ideally, for safety, the overall filter gain in
the HRIR-aware downmixer should be such that there is no clipping even
if all input channels are active - and that gain is different for 5.1
and 4.0 cases, just because of the different number of channels. The
core remixer erases this information.

Now consider that this listener unplugs headphones. Now the sound
should go to his 5.1 audio system, but instead continues to play on
this downmixer sink and gets further upmixed by the core. That's
clearly wrong.

The same conclusion about profile interaction can be reached by
considering module-equalizer-sink. It does not switch its number of
channels even if its master sink does that. As a result, 2.0 -> 5.1 ->
2.0 yo-yo is entirely possible (and of course unwanted).

Writing a fancy upmixer based on reverse-engineered Dolby Pro Logic or
on scientific papers is also within the scope of this project.

Current status: I have a rewritten (and rewind-friendly!) virtual sink
module sitting on my laptop that applies arbitrary IIR filters. I will
send it after cleanup of scripts that generate the filter
coefficients. This is already good enough to provide LFE channel
extraction, to replace the virtual surround sink and even to provide
virtual surround effect on my laptop speakers, but of course not good
enough to solve the profile-related problem.

Contact: Alexander E. Patrakov

Necessary background: digital sound processing, C

Possible split:
 * integrate multichannel-to-binaural HRIR-based downmixer into core
(possibly after I publish the IIR sink) (and maybe allow its use even
for stereo streams, to narrow them down if the user wants it)
 * integrate binaural-to-stereo remixer into core (when I publish the
IIR sink, or based on the published ambiophonics research)
 * integrate LFE extraction into core (when I publish the IIR sink, or
independently)
 * write and integrate a fancy stereo-to-5.1 upmixer based on published research
 * integrate heuristics to apply and unapply the above effects appropriately

5. Per-channel delay (probably too simple)

Problem statement: some high-end audio receivers (e.g. Onkyo TX-NR626)
have an option to introduce a separately-configurable delay in each
channel. This is needed, e.g., if due to the room geometry constraints
the speakers are not equidistant from the listener. This happens,
e.g., with the front-center channel if one places all three front
speakers near the wall - in this case, the front-center signal needs
to be slightly delayed WRT front-left and front-right in order to
arrive to the listener at precisely the same moment of time. It would
be nice to emulate this feature in PulseAudio for the benefit of users
with cheap 5.1 analog speakers, and provide a GUI for it.

Contact: Alexander E. Patrakov (?)

Necessary background: C, GTK+.

6. Digital Room Correction for PulseAudio

Problem statement: some high-end audio receivers (e.g. Onkyo TX-NR626)
do not even have a graphical equalizer! Instead, they come with a
calibrated microphone and a digital room correction feature in the
firmware. They play a known test sound through each speaker, record
what the microphone hears, and thus learn about the room acoustics.
Then they apply this knowledge to equalize the played-back sound. This
feature should be available for users of analog speakers, too, via
PulseAudio.

In fact, there already exists a free implementation of Digital Room
Correction: http://drc-fir.sourceforge.net/ , one just needs to write
a FIR convolution engine for PulseAudio and a GUI for calibration. And
also to think how to work around the fact that a calibrated microphone
is not always available - luckily there are some readily-available
"calibrated" sound sources like popping bubble wrap.

Contact: Alexander E. Patrakov

Necessary background: C, digital sound processing, a calibrated microphone.

7. Intra-application sound mixing (needs discussion, may be a social
problem after all)

Some time ago, I added a documentation patch (with some improvements
from Tanu) about known misuse of PulseAudio API. As a part of that
patch, I made a far-fetched but IMHO true statement that sometimes it
is a responsibility of the application itself to mix its own streams
(as it is done in Wine) or to attenuate samples. However, I am afraid
that this will be percieved as a documentation of a PulseAudio bug
(inability to mix individual application streams without polluting the
mixer GUIs with extra sliders) that just shifts the responsibility and
extra work to individual developers. Also, this documentation is not
read by developers that use PulseAudio not directly, but via wrappers
like GStreamer and Qt, so a source of "application bugs" still exists.

To be fair, in GStreamer the problem looks solved: "audiomixer"
performs synchronous in-application mixing - just what is needed. But
not everyone uses or wants to use GStreamer. So I think that there is
some room for improvement in PulseAudio itself.

Problem statement: add API functions to PulseAudio that would allow an
application to request that its streams are mixed together without
showing a separate volume slider for each of them in pavucontrol and
similar PulseAudio mixer applications.

Contact: ?

Required background: C

8. LV2 sink (maybe too simple)

Problem statement: LV2 is a successor of LADSPA. Pulgin authors move
to the new API, but PulseAudio does not have any way to load these
lugins and use them for sound processing. A new virtual sink needs to
be written, as well as a GUI (possibly integrated into pavucontrol).

Contact: Alexander E. Patrakov

Required background: C, GTK+

9. Dynamic range compression (maybe already solved)

Problem statement: some consumer electronics (e.g. the Onkyo TX-NR626
receiver) have a mode in which they reduce the dynamic range of the
incoming signal. This is supposed to be used when listening to
classical music at night, so that neighbours don't wake up and the
quietest passages are still audible. Make this feature available to
users of cheap analog speakers, via PulseAudio. Write a GUI for
configuring it.

This may be already solved by vlevel LADSPA plugin (I have not tried
it), but needs GUI integration and heuristics to apply this only to
high-latency music sterams from players. And possibly a port is needed
for rewind comatibility, but I am not sure here if this is possible at
all.

Contact: ?

Required background: C, GTK+, digital signal processing (?)

10. GUI for module-combine-sink and module-remap-sink

Problem statement: the functionality to duplicate sound to several
cards or to split one sound card into several virtual cards is
currently available only via the configuration file or via pacmd. A
GUI way to do the same tasks is needed.

Contact: ?

Required background: GTK+

P.S. With my current job, I don't have enough time to be a good mentor
or even a good contributor. But I am open to job offers that would
allow me to either work on PulseAudio from my home in Russia
(preferred), or will require relocation to either UK or Ireland.

-- 
Alexander E. Patrakov