[gst-devel] JACK and GStreamer, from the horse's mouth

Wed Nov 29 00:09:22 CET 2006

On Tue, 2006-11-28 at 22:14 +0100, Lennart Poettering wrote:
> On Mon, 27.11.06 09:51, Paul Davis (paul at linuxaudiosystems.com) wrote:
> 
> Hi!
> 
> Just my 2¢ on this discussion, as the maintainer of PulseAudio:

Thanks for joining the conversation. Sorry to hear you can't be at DAM3.

> While a Jack-like API makes sense for applications which Jack was
> designed for (i.e. "pro" audio, audio processing), it definitely does
> not make any sense as a general purpose API for all use cases.

My claim all along has been that the only essential element is enough
presence of the underlying pull-model-nature of audio h/w to ensure that
apps are designed properly around it. This makes sample synchronous
inter-application audio possible (and relatively easy), among many other
things.

> Please remember that not all machines running Linux and GStreamer have
> a FPU.

This is a red herring. There is nothing in anything I've been arguing
for that requires floating point. I happen to like floating point
format, but what is much more important is that you don't get an
API/pipeline design in which format negotiation is possible anywhere and
everywhere. 

>  And even if they have one, it might be quite a bit slower than
> integer processing in the CPU. Just think of the Nokia 770. Requiring
> conversion from and to FP for every sample played is not option on
> these machines. The same is true for sample rate handling. Since
> sample rate conversions are quite computation intensive this might
> hurt even more. 

SRC is tricky, because there are many ways to do it and with many
different tradeoffs. quality versus CPU cycles consumed etc. i don't
know of any good approaches to resolving this, which is a small part of
the reason why JACK requires all clients to run at the same SR.

> Media files come in different formats, with different
> sample rates. At some place a conversion has to happen. Pushing that
> conversion job unconditionally into the client seems to be a little
> bit too simple in my eyes. 

a good media file API (e.g. libsndfile) makes the conversion happen
transparently. a good streaming architecture prevents it from happening
more than once (or twice, depending on the signal flow). for video, this
is harder because there does not appear to be a single data format that
suits all needs. and to be honest, people with interest in spectral
domain processing feel somewhat the same about audio and the dominance
of time-domain data. my point remains: pushing this "out of the server"
doesn't mean pushing it into the client's own code, but yes, something
in client land (such as libsndfile) has to handle it.

> The pull model for audio playback creates as many problems as it
> solves. It makes some things easier, but others more
> complicated. Since it requires threading in a way, 

most people get confused enough already by the fact that their GUI API
has taken control of the program. telling them that their
single-threaded app is blocked waiting for I/O from audio h/w isn't
going to help. hence, threads. there's no technical need for this if you
substitute much more capable programmers; thats unrealistic, so threads
it is :)

> Video4Linux is an open/ioctl/read/write/close-API for video, btw.

with a very, very specialized niche use. hardly an advertisement for the
power of "everything is a file" model when applied to interactions with 
video hardware. unless the gdk backend for v4l has escaped my attention :)

> Forcing everyone into one sample format has many more drawbacks,
> btw. Just thing of AC3 pass-through/SPDIF. For these outputs movie
> audio tracks should not be touched when passed through the audio
> layer. 

JACK streams AC3 already, the data is not modified but is processed. 

> Compressing/uncompressing/resampling them would increase

You're going to jump on me for saying this, but i continue to regard
lossily compressed audio as a temporary artifact of limited bandwidth
and to a lesser extent storage. in 5 years, everyone will wonder why we
ever used mp3 or even ogg for storage, and in 10 years, the same
question will be asked about network transmission. designing APIs around
such short term technical details seems like a mistake to me.

> It strikes me very gutsy to claim that the Windows API is an
> example of good API design. 

Then I am gutsy. I'm no fan of Windows APIs, but ASIO itself is not a
Microsoft design, and its notably different from most of the Windows
API. DirectX was originally written by Cakewalk, not Microsoft, and
before MS got their grubby hands all over it, it wasn't bad. But no, I'm
not seriously holding these up as shining examples of API brilliance,
I'm citing them as *massively* used pull/callback-based designs that
have worked exceedingly well for many ISV's of different sizes.

> On Unix programs have traditionally been single-threaded and organised
> around a single poll()/select() event loop.

not realtime unix programs. and this is the key difference. if you have
two devices feeding a poll/select loop, and one is going be handled with
great speed due to RT deadlines and one is done lazily (e.g. gui stuff),
you cannot put them into the same thread. i worked on unix (solaris,
ultrix, dynix) programs in the mid 80's that were multithreaded for this
very reason. most people seem to forget that at some level, audio s/w is
all real time, its just that with enough buffering you can (mostly)
ignore that. if the buffering is a problem because you want low latency,
you can't ignore it and you need threads.

>  In contrast on Windows I/O
> multiplexing is usually done with threads. Hence I would argue that
> those threaded, pull-based APIs are more a reminiscence to where they
> came from, then a feature of good API design.

An interesting point, and I don't entirely disagree with it. However, I
think that whether it is serendipitous that both the modern Windows and
CoreAudio APIs got it right, the fact is, they got it right.

> Please understand that I don't think that pull-based/threaded APIs are
> necessarily a bad idea. Right the contrary: you need to push audio
> processing into a separate thread, instead of running it in the normal
> main loop - for latency reasons. However, I would leave the control of
> this to the programmer instead of the API designer.

but its *very* hard to get it right. there were dozens of attempts at
this in linux-land when i started writing audio software in 1998, and i
think everyone of them was wrong. my first 3 or 4 efforts were wrong.
even JACK still has a few issues with some corner cases. we don't want
application programmers to have to think about this kind of thing.

> Please remember that the the traditional Unix
> open/ioctl/read/write/select/close way to do things is more powerful
> then the pull-model. Why? Because you can easily implement the
> pull-model on top of the Unix model, just by running the following
> trivial function as a thread:
> 
> pull_thread() {
>     for(;;) {
>         audio_data = call_pull_function();
>         write(audio_dsp, audio_data);
>     }
> }

that model isn't the same thing as the pull model, or rather, its
similarity to the pull model depends on the way the device driver on the
other side of write(2) is implemented. when does the write return? how
much data can be delivered to write? what if the amount of data is too
much? not enough? what if the call_pull_function() is too slow, who
handles that and how?

what is wrong with open/read/write/close/ioctl is not its power, but the
style of programming it creates. rather than being aware that they are
interacting with a soft real time streaming device, it has historically
encouraged programmers to treat the audio device just like another file.
even very good programmers - i won't name names, but there were major
apps that until very recently still did this. "everything is a file" is
a fantastically powerful model, but it breaks down when time becomes a
real component in data flow. and with audio (and to a lesser but still
important extent, with video), it most definitely is.

> I've been thinking about audio APIs quite a lot in the last
> months. I've seen quite a lot of different APIs, from different sound
> servers, hardware interfaces, operating systems. I didn't find a
> single one that would make all the uses cases I had on my list happy -
> and would be easy to use.

my problem with this claim is that we are all sitting around with a
total damn mess as far as multimedia goes, whereas you don't find
developers on OS X lamenting the fact that they don't have the perfect
API. 

> OSS is a deadly simple API, that is widely accepted and is the only
> one that is widely accepted *and* cross-platform. It is here to

does OSS work on Windows? on OS X? because JACK and PortAudio work on
both. PortAudio, sadly, seems to have gone into a bit of a dead space.

> stay. It's the least common denominator of Unix audio APIs. However,
> It has a few drawbacks. Firstly it's very difficult to emulate,
> because it is a kernel API. (Monty from Redhat is now working on FUSD,
> a way to emulate character devices in userspace, which should finally
> clean this up) Secondly it doesn't support all those new features pro
> audio cards support.

I sincerely hope that Monty plans to get to DAM3. I am not convinced
that this is the way finally clean this up.

> I am sorry, but I don't follow you on this. As i showed above, the
> pull model can easily implemented using the traditional unix way to do
> things. Claiming that the pull model is the "only" model to marry pro
> and desktop media worlds is simple nonsense.

My claim is that if the pull model is not in some way the "center of the
universe", then it will all go wrong. Allowing people to use push, but
making sure they are clear about what this means, is fine with me. I do
not believe that you can meet the goals of JACK (sample synchronous, low
latency execution of multiple applications potentially sharing data and
device access) with a push model. Some other JACK developers disagree
with me but nobody has found the time to prove it.

> Instead I would argue that the pull model actually has negative
> effects. It is very difficult to marry a system like Gstreamer (which
> already does its own threading)

when i read about this in early gstreamer, my heart sank.

>  with the way threading is handled by
> the pull API. Why? Because you need to pass the audio data between the
> gst-handled thread and the api-handled thread. That passing requires
> buffering. Thread-safe buffering adds latency and complexity. (and
> remember that systems like JACK stress the fact that their buffer is
> "lock-free", a feature which is not worth much if you need to add more
> buffering and sychronoization because of the pulll model)

i've never seen any evidence of the need to add more as a result of the
pull model. 

>  Both are
> easy to avoid if the API would be unix-style with
> open/read/write/select/close.

you're just relying on kernel level synchronization to take care of
things for you. i don't think thats good enough. plus all the issues
with the write() based approach that i touched on above.

> It's not clear to me what the OSDL tries to achieve with the DAM
> multimedia meetings? Is the plan to define a new abstracted sound API?
> That would be a very difficult thing - and a very questionable one as
> well - since there are so many APIs around already. I wonder who's the
> intended target group for this new API? Pro-Audio people? Game
> developers?  Networked Audio people? "desktop" application
> programmers? all of them?

lennart, i wish i knew :)

i happen to have some friends in portland, and enough cash to fly out
there. DAM-1 was almost a complete waste of time. the idea was to get as
many interested parties together. it didn't really accomplish anything
as far as media issues. i am not optimistic that this 

> I wonder where there problem with that is in your eyes? Some apps do
> it this way, others do it a different way. So?

because it becomes very hard to do the right thing when you want apps to
share data or device access and some of the are pull-based and some
push-based. 

But you're right. gstreamer-devel is probably not the right place for us
to be having this discussion. Problem is ... where?

--p