[gst-devel] JACK and GStreamer, from the horse's mouth

Wed Nov 29 03:13:30 CET 2006

On Tue, 28.11.06 18:09, Paul Davis (paul at linuxaudiosystems.com) wrote:

> > Just my 2¢ on this discussion, as the maintainer of PulseAudio:
> 
> Thanks for joining the conversation. Sorry to hear you can't be at
> DAM3.

BTW, do you know if OSDL provides travel/accomodation sponsorship for
any of their meetings, for all those poor students who cannot afford
the trip on their own? I'd certainly be willing to attend...

> > Please remember that not all machines running Linux and GStreamer have
> > a FPU.
> 
> This is a red herring. There is nothing in anything I've been arguing
> for that requires floating point. I happen to like floating point
> format, but what is much more important is that you don't get an
> API/pipeline design in which format negotiation is possible anywhere and
> everywhere. 

What else would you suggest? 32bit integer? Might by controversial.

One of the major design features of PulseAudio is not to ever copy
audio data around in memory. I.e. in PulseAudio, when you have a
simple networked recording application as client, we are able to call
write() on the network socket to the client directly on the
mmap()'ed DMA memory buffer of the soundcard. This lowers CPU load,
reduces memory consumption and in result also lowers the minimum
latency possible.

A sample rate/type conversion can be seen as CPU intensive copy
operaton. Hence, when you remove the ability to choose the sample
type/rate in client apps this nice design is corrupted and at least
one copy operation is always executed. 

I agree that it is appealing to standardize on a single sample format
for an audio API. However, on the other hand I think that one would
lose too much.

> >  And even if they have one, it might be quite a bit slower than
> > integer processing in the CPU. Just think of the Nokia 770. Requiring
> > conversion from and to FP for every sample played is not option on
> > these machines. The same is true for sample rate handling. Since
> > sample rate conversions are quite computation intensive this might
> > hurt even more. 
> 
> SRC is tricky, because there are many ways to do it and with many
> different tradeoffs. quality versus CPU cycles consumed etc. i don't
> know of any good approaches to resolving this, which is a small part of
> the reason why JACK requires all clients to run at the same SR.

libsamplrate implements a few different resampling
algorithms. However, the only ones to take seriously are sinc.

> > Media files come in different formats, with different
> > sample rates. At some place a conversion has to happen. Pushing that
> > conversion job unconditionally into the client seems to be a little
> > bit too simple in my eyes. 
> 
> a good media file API (e.g. libsndfile) makes the conversion happen
> transparently. a good streaming architecture prevents it from happening
> more than once (or twice, depending on the signal flow). for video, this
> is harder because there does not appear to be a single data format that
> suits all needs. and to be honest, people with interest in spectral
> domain processing feel somewhat the same about audio and the dominance
> of time-domain data. my point remains: pushing this "out of the server"
> doesn't mean pushing it into the client's own code, but yes, something
> in client land (such as libsndfile) has to handle it.

libsndfile doesn't do do sample rate conversions, only sample type
conversions.

For the sake of perfomance I still believe that sample type/rate
conversions should happen only if really necessary, which effectively
means as late in the playback pipeline as possible.

> > Video4Linux is an open/ioctl/read/write/close-API for video, btw.
> 
> with a very, very specialized niche use. hardly an advertisement for the
> power of "everything is a file" model when applied to interactions with 
> video hardware. unless the gdk backend for v4l has escaped my
> attention :)

"Specialized niche use"? video4linux is *the* API to access video
cameras and grabbers on Linux machines.

> > Forcing everyone into one sample format has many more drawbacks,
> > btw. Just thing of AC3 pass-through/SPDIF. For these outputs movie
> > audio tracks should not be touched when passed through the audio
> > layer. 
> 
> JACK streams AC3 already, the data is not modified but is
> processed. 

Ah, JACK can do that? So Jack is no longer 32bit float only? nice to
hear. Where can I read anything about this? All i could find is a tool
called ac3jack which is effectively just a Jack powered ac3 compressor.

> > Compressing/uncompressing/resampling them would increase
> 
> You're going to jump on me for saying this, but i continue to regard
> lossily compressed audio as a temporary artifact of limited bandwidth
> and to a lesser extent storage. in 5 years, everyone will wonder why we
> ever used mp3 or even ogg for storage, and in 10 years, the same
> question will be asked about network transmission. designing APIs around
> such short term technical details seems like a mistake to me.

I really doubt this. RF bandwidth is limited due to the Shannon
Limit. When the limit is hit only (audio) compression can help us to
increase the throughput even further.

> >  In contrast on Windows I/O
> > multiplexing is usually done with threads. Hence I would argue that
> > those threaded, pull-based APIs are more a reminiscence to where they
> > came from, then a feature of good API design.
> 
> An interesting point, and I don't entirely disagree with it. However, I
> think that whether it is serendipitous that both the modern Windows and
> CoreAudio APIs got it right, the fact is, they got it right.

Dunno about that. The ALSA API is very powerful. While it don't like
it due to its complexity I still believe it is very good for writing
correct realtime audio applications. And you don't even have to bend
the API for that in any way. So, if you say that the Windows/CoreAudio
APIs got it right, I can only respond: "yeah, and the ALSA API,
too". ;-)

> > Please understand that I don't think that pull-based/threaded APIs are
> > necessarily a bad idea. Right the contrary: you need to push audio
> > processing into a separate thread, instead of running it in the normal
> > main loop - for latency reasons. However, I would leave the control of
> > this to the programmer instead of the API designer.
> 
> but its *very* hard to get it right. there were dozens of attempts at
> this in linux-land when i started writing audio software in 1998, and i
> think everyone of them was wrong. my first 3 or 4 efforts were wrong.
> even JACK still has a few issues with some corner cases. we don't want
> application programmers to have to think about this kind of thing.

I dunno if it's really that hard. The problem is more that free
software people tend to copy other people's code instead of reading
the manuals and doing it by the book. Unfortunately this has lead us
to a point where many programs didn't get it right, because they
copied broken code it integrated with badly with other code. But I
wouldn't lose any hair about that. For most apps it just doesn't
matter, because they are not real-time. 

> > Please remember that the the traditional Unix
> > open/ioctl/read/write/select/close way to do things is more powerful
> > then the pull-model. Why? Because you can easily implement the
> > pull-model on top of the Unix model, just by running the following
> > trivial function as a thread:
> > 
> > pull_thread() {
> >     for(;;) {
> >         audio_data = call_pull_function();
> >         write(audio_dsp, audio_data);
> >     }
> > }
> 
> that model isn't the same thing as the pull model, or rather, its
> similarity to the pull model depends on the way the device driver on the
> other side of write(2) is implemented. when does the write return? how
> much data can be delivered to write? what if the amount of data is too
> much? not enough? what if the call_pull_function() is too slow, who
> handles that and how?

OSS and ALSA define the semantics of the write() call very clearly. I
don't see much of a problem here. 

> my problem with this claim is that we are all sitting around with a
> total damn mess as far as multimedia goes, whereas you don't find
> developers on OS X lamenting the fact that they don't have the perfect
> API. 

I guess we can agree that multimedia on Linux is currently a big
mess. It is my declared goal to fix this with PulseAudio. ;-)

> > OSS is a deadly simple API, that is widely accepted and is the only
> > one that is widely accepted *and* cross-platform. It is here to
> 
> does OSS work on Windows? on OS X? because JACK and PortAudio work on
> both. PortAudio, sadly, seems to have gone into a bit of a dead
> space.

Sure, the cross-platformness of OSS doesn't extend to Windows and
Macosx. But it's still more cross-platform than any other accepted
sound API on Linux.

PortAudio has other serious issues I would argue, besides that it
seems to have died.

> > stay. It's the least common denominator of Unix audio APIs. However,
> > It has a few drawbacks. Firstly it's very difficult to emulate,
> > because it is a kernel API. (Monty from Redhat is now working on FUSD,
> > a way to emulate character devices in userspace, which should finally
> > clean this up) Secondly it doesn't support all those new features pro
> > audio cards support.
> 
> I sincerely hope that Monty plans to get to DAM3. I am not convinced
> that this is the way finally clean this up.

I don't think he will attend either, but I don't know.

Emulating OSS with FUSD is not intended to be the grand solution for
an ever-lasting bright future. Instead it is more born out of
acknowledging that OSS is not going to get out of the way and we need
to provide optimal compatiblity with it for many years coming.

BTW, will you attend linux.conf.au or FOMS 2007? I'd really like to
discuss the "grand plan to fix the linux audio jumble" with you
personally.

At GUADEC and the UDS in Mountain View I discussed this with the
Redhat and Ubuntu guys. The plans of the Ubuntu and Redhat people
have very much in common, and the remaining stumbling blocks for
adoption of PulseAudio are now moved out of the way, as we speek.

However, at both occasions our discussion suffered by the fact that
noone of us had a "pro audio" background.  

> > I am sorry, but I don't follow you on this. As i showed above, the
> > pull model can easily implemented using the traditional unix way to do
> > things. Claiming that the pull model is the "only" model to marry pro
> > and desktop media worlds is simple nonsense.
> 
> My claim is that if the pull model is not in some way the "center of the
> universe", then it will all go wrong. Allowing people to use push, but
> making sure they are clear about what this means, is fine with me. I do
> not believe that you can meet the goals of JACK (sample synchronous, low
> latency execution of multiple applications potentially sharing data and
> device access) with a push model. Some other JACK developers disagree
> with me but nobody has found the time to prove it.

So, if I understood correctly it's more a philosophical problem? You
want people to acknowledge that it's generally a better idea to use a
pull model - and to leave the push/unix model for the few cases where
the developer has a good reason and really knows what he does? If
that's what you're trying to say, then I guess I can agree.

> >  with the way threading is handled by
> > the pull API. Why? Because you need to pass the audio data between the
> > gst-handled thread and the api-handled thread. That passing requires
> > buffering. Thread-safe buffering adds latency and complexity. (and
> > remember that systems like JACK stress the fact that their buffer is
> > "lock-free", a feature which is not worth much if you need to add more
> > buffering and sychronoization because of the pulll model)
> 
> i've never seen any evidence of the need to add more as a result of the
> pull model. 

Just think of a voip app on top of a pull-model sound server like
JACK. One thread would deal with incoming network packets and
decode/decompress the audio data, which might be quite CPU
intensive. The pull callback would be executed from a second
thread. So, somehow the two threads need to communicate, i.e. the
network thread needs to pass the audio data to the cb thread. How's
that done? With an asynchronous, thread-safe queue. The cb thread now
pulls the data from that queue and pushes the data to another queue -
the one that connects to the sound server. So, effectively we now have
two queues which are linked to each other via the callback
thread. Wouldn't it make more sense to have the network thread push
the data directly into the sound server queue? Even if that means that
that queue would need to be increased in size? So: in summary: the
pull-model for this setup requires two queues, the push-model only
one. Which model is better in this example? I'd vote for the
pull-model.

(and don't forget that because some lame dev coded it the queue
between the two threads is not lock-free! uah! ;-))

> 
> >  Both are
> > easy to avoid if the API would be unix-style with
> > open/read/write/select/close.
> 
> you're just relying on kernel level synchronization to take care of
> things for you. i don't think thats good enough. plus all the issues
> with the write() based approach that i touched on above.

kernel level syncrhonisation takes place anyway. Since you cannot
avoid it, why not use it for something useful?

> > It's not clear to me what the OSDL tries to achieve with the DAM
> > multimedia meetings? Is the plan to define a new abstracted sound API?
> > That would be a very difficult thing - and a very questionable one as
> > well - since there are so many APIs around already. I wonder who's the
> > intended target group for this new API? Pro-Audio people? Game
> > developers?  Networked Audio people? "desktop" application
> > programmers? all of them?
> 
> lennart, i wish i knew :)
> 
> i happen to have some friends in portland, and enough cash to fly out
> there. DAM-1 was almost a complete waste of time. the idea was to get as
> many interested parties together. it didn't really accomplish anything
> as far as media issues. i am not optimistic that this 

maybe we should try to get all interested people together at a more,
...humm... popular conference such as linux.conf.au 2007. Or perhaps someone
should organize as seperate "The grand cleanup-the-linux-audio-jumble
summit".

(unfortunately, though, I know that neither Pierre Ossman (the second
guy in command for Pulseaudio) nor Monty will attend lca 2007. :-()

> But you're right. gstreamer-devel is probably not the right place for us
> to be having this discussion. Problem is ... where?

Dunno. Perhaps on a new cleanup-linux-audio-jumble-discuss ML? ;-)

Lennart

-- 
Lennart Poettering; 0poetter [at] informatik [dot] uni-hamburg [dot] de
ICQ# 11060553; GPG 0x83E23BF0; http://www.stud.uni-hamburg.de/users/lennart/