[gst-devel] JACK and GStreamer, from the horse's mouth

Mon Nov 27 15:51:51 CET 2006

On Mon, 2006-11-27 at 00:36 +0100, Benjamin Otte wrote:
> On Sun, 26 Nov 2006, Paul Davis wrote:
> 
> > this means that you never call ioctl, so i guess you never select a data
> > format or data rate. interesting. and i guess you never care about
> > latency. and presumably you certainly don't care whether your app can
> > route date to or from other audio apps, which is a rather Unix-y thing
> > to want. suprising given your appreciation of the open/read/write/close
> > model.
> >
> FWIW, ESD has data format and rate selection in the open() call. OSS might
> need an ioctl or it uses 44100 S16 stereo by default, not sure.

OSS uses 8 bit .au at 8kHz by default. yes, really.

> As a comparison, ALSA requires a set_buffer_size call or it might use a
> 2048 bytes big buffer and then I end up with stuttering audio. 

so what's the right default here? how do you define a default when there
really is no value that works for most people most of the time? and
stuttering audio is actually a responsibility of the kernel, which is
why i've worked hard over the last 8 years to try to get this addressed.
thankfully, and mostly thanks to ingo molnar, that goal is more or less
achieved.

> If I were
> just writing my little app, I'd not even know what that means and would
> have given up by then.

if you were writing your little app using an API *like* JACK (no format
negotiation, no rate negotation, no API negotiation), you wouldn't have
to know what it meant, and you would already be done :)

> > moreover, these open/read/write/close apps cannot be ported to any API
> > that uses a callback model: i.e. every API under the sun except the
> > linux hacks based on an open/read/write/close model.
> >
> > if open/read/write/close is so wonderful (and i have to admit to being
> > quite a defender of it in the past), why isn't it used for video?
> >
> I don't care a single bit if it's pulling, pushing, sneezing or otherwise
> doing stuff, as long as it's dead simple to use. Let me repeat that: It
> doesn't matter how perfect your application is, unless getting started is
> really simple.

my point is that this claim isn't true for any video or graphics API, at
least no more so than ALSA, and arguably much less so. there is no
open/read/write/close API for interacting with h/w video adapters
because they fundamentally don't work - the device has various
characteristics that most programmers want abstracted away.
Consequently, people use APIs like Qt, GTK, Cocoa and so forth. When
they want to draw, they use APIs like Cairo or GDI or OpenGL. None of
these APIs are remotely simple to use but this doesn't seem to have
stopped the development of application software that needs to interact
with h/w video adapters.

it appears to me that because sound cards spent a long time stuck in an
amazingly simple world - 1 data format (16 bit interleaved stereo at
44.1 or 48 kHz) and one h/w API (the SB16) - lots of people think it
that playing back audio should always be simple. by contrast, the APIs
that make interacting with the video adapter simple have been evolving
over a *long* period of time and are still evolving (e.g. Cairo), and
nobody expects that there is some 3 line method to blit a PNG to a
specific part of the screen. but people continue to expect that even if
the user has 512 channels of ethernet-based I/O as their audio
interface, the same 3 lines that work for some consumer-level onboard
chipset should somehow still make a "system beep".

> >         b) forces all participants to use a single data format which
> > happens to be the format used by 99% of all audio programmers in the
> > world (32 bit floating point). the 1% that don't are generally writing
> > consumer/desktop multimedia on linux. the rest - all plugin developers,
> > all DAW developers and the rest - have realized that the benefits of
> > using 32 bit float outweighs any inconvenience. and by removing format
> > negotiation from graph entry, applications are actually much easier to
> > write.
> >
> Surprisingly, neither my CD player, my MP3 decoder, nor my sound cards
> seem to have had contact with audio programmers. Or maybe you live in a
> different world than I do.

neither your CD players nor your sound cards run general purpose audio
software. Go take a look at Windows and MacOS and OS X, where 90+% of
the world live (for better or for worse), and where almost all the cool
(media) app development has been taking place for the last 10-15 years.
The overwhelming majority of that stuff is done using APIs where
callbacks and a single data format are the norm, not the exception.
DirectX, WDM, ASIO, are just a few of the windows acronyms for this. To
be fair, DirectX does include format negotiation in a style somewhat
similar to GStreamer, and while its appreciated by desktop developers on
windows for the same reason that gstreamer looks cool, the pro audio
developers prefer to avoid it.

> > so presumably you think that CoreAudio, whose API is twice the size of
> > JACK's and involves massively more complexity, also sucks?
> >
> Considering I wasn't able to find a 10-line code file to output an audio
> stream with CoreAudio in my 5 minutes of googling, I'd say that it does
> indeed suck.
> Btw, for ALSA the first link was this:
> http://equalarea.com/paul/alsa-audio.html#playex
> which, apart from the huge lot of setup commands, is reasonably short to
> get started.
> 
> 
> I'm not trying to tell you that there's any technical reason why JACK is
> bad.
> The only thing that sucks about JACK is that getting started is extremely
> complicated compared to lots of other sound outputs. 

the canonical example app for JACK was, against my wishes, "corrupted"
to include transport control, which is very regrettable. the older
version, and its even simpler twin, jack_monitor, which copies its
inputs to its outputs, is half the size of the ALSA example above. 

> And as long as that
> is the case, noone will take JACK as the first choice for audio output
> unless they write their app for JACK.

i've said over and over that i don't want desktop media to write for
JACK. i've been at DAM meetings telling people that gstreamer and/or
pulseaudio are much better ideas. but as i've also said, i want people
to use a callback ("pull") style API because it is the *only* model that
can unify the pro and desktop media worlds. and yes, that involves a
little bit more background - you have to understand what a callback is.
this doesn't seem to have impeded dozens or even hundreds of mostly very
inexperienced OS X freeware developers from cooking up neat little apps,
and i don't see why it should be a burden on linux.

> PS: The pull model used by non-blocking file descriptors is perfectly
> fine. ESD and OSS support that model.

but they do not require it, and because they support
open/read/write/close and because this perceived to be easier by so many
*nix developers, we end up with two worlds - apps written around
pull-requiring APIs and apps based on a push model.