Multimedia support (Re: X Developers' Summit)

Carsten Haitzler (The Rasterman) raster at
Tue Jul 17 05:41:31 PDT 2007

On Tue, 17 Jul 2007 09:48:53 +0200 Helge Bahmann <hcb at> babbled:

> Am Dienstag, 17. Juli 2007 00:43 schrieb Carsten Haitzler:
> > On Mon, 16 Jul 2007 16:55:46 +0200 Helge Bahmann <hcb at> 
> babbled:
> > > yes, and frankly I fail to see what PulseAudio is doing fundamentally
> > > different from mas when it comes to synchronisation. Not that it is
> > > impossible, but it will get very very messy, and I can show that
> > > integrating it into X to begin with neatly solves this problem.
> >
> > personally - i think going via x proto has merit. you have a point. the
> > only problem now is that it becomes x's job to syncronise audio within
> > itself (and you thought you got rid of the problem!). you basically shifted
> > the problem from the app to x. that's fine - but it's not a silver bullet.
> > you will probably now need to content with mixing audio in client-space
> Actually not quite, mixing is performed in the server, but under the
> direction of a "mixer client" (somewhat analogous to the "compositing
> manager")
> Basically the idea is as follows... the extension provides X requests to 
> perform operations on samplebuffers within the server, for 
> example "MultiplyAccumulate" which you would probably use for mixing. It is 
> now the responsibility of the "mixer client" app to issue commands to
> properly mix all active streams into the master stream (and it may do
> whatever it chooses to do to perform the mixdown, including
> suppressing/dropping voices, or funny effects like panning audio as the
> window is moved across the screen)

ok. something here doesn't gel with me. my warning instincts are beeping. we
now have:
1. audio client write to buffer
2. context switch to server.
3. server write to mixer client
4. context switch
5. mixer client mixes and writes to buffer
6. context switch
7. server writes to audio device

where going to /dev/dsp is:
1. audio client write to device

now without a mixer client:
1. audio client write to buffer
2. context switch
3. server mixes and writes to audio device

a lot less latency.

to me the idea of adding timestamps to showing an image or playing a sound is
nice - BUT i think to me it smells of patching a problem the wrong way. a
problem that is created by the design. while a mixer client is nice in
principle - in reality - how many funky uses to you really expect?

personally if i am writing an x app that displays an image with some sound
syncronise to the display i want to do:

for (;;) {
int time_to_wait;
/* calculate time to wait based on last "Frame" display */

and i KNOW it will work because the audio snippet is played along with the
display of the image - no need to change my graphics pipeline at all - just
slot in audio at the right spot.

> Now, in principle the mixer client has to make sure that all mixing commands 
> are executed "in time" by the server which -- unsurprisingly -- turns out to 
> be quite problematic due to latency requirements. This is where the second 
> extension kicks in -- it allows clients to issue X requests (a "sensible"
> subset of all available requests) that are not executed immediately 
> but "deferred" to a specific synchronisation point in time, e.g. 
> synchronized to the audio device (unlike the XSync extension this mechanism 
> does not prevent the client from issuing other requests while the deferred 
> requests are pending). This allows the mixer to schedule enough mixing 
> commands to overcome any client<->server communication latency, does not 
> introduce audio "policy" into the server, but the actual mixdown can still be 
> performed with low latency (in theory bounded only by the capabilities of the 
> hardware and OS scheduling).
> > (for multiple audio streams), mixing buffers (or audio stream priorities to
> > block off audio from other clients), need to handle audio device buffer
> > sizes for sync (and when you add a mixing buffer this really get a bit
> > nasty when combined with needing to run basically realtime as skips in
> > audio when you run out of enough cpu to do the mixing really sound horrible
> > - dropping frames is almost heavenly bliss in comparison).
> Yes you are right, audio poses a number of hard problems, and the fact that 
> current xorg server is not exactly well-behaved when it comes to 
> (dispatching) latency does not help either :(. Personally, I suspect this is 
> the main reason everyone else is writing separate audio servers, but 
> hopefully I can demonstrate that this is not really necessary.

sure - i think this is a good direction. in the end you can always consider a
separate thread that reads the input buffers from clients and mixes them with
realtime priorities for the thread (yes - i am thinking server-side mixer here.
advantages are x runs as root - thus it can request realtime priorities easily
from the kernel).

> > anyway - point is - audio is its own can of worms. moving it to x is good
> > from one point - if it IS a remote display the DISPLAY itself will play the
> > audio- i.e. - the audio is where it should be - the user. as a user i have
> > no desire for my audio to play out of the server in the room 5 miles away.
> > i want it here at my terminal where i am, and an x-based audio layer would
> > make that simple and easy to achieve. for local clients this is just "yet
> > another route to the audio device" (nb - you will want to be considering
> > shm transports for local clients).
> Shm samplebuffers are on my todo-list, but not implemented yet (and most 
> certainly won't be before the first release). From my point of view they are 
> desirable mainly because it would allow to bypass X dispatching entirely, so 
> latency could in fact be very close to what you can achieve with direct 
> hardware access.

well you still need x proto for a signalling channel - when to switch shm
buffers, which parts of the buffer are to be played, locked, etc. but it does
cut out the copies. though unlike images the cost of the copy is almost nothing
compared to XPutImage vs. XShmPutImage()

anyway - just wishing you a lot of luck in what you are doing - the principle
idea i think is good - the details i think can be discussed. as i said - i
disagree a bit as i think 99.9% of the use will be either simple mixing
(multiplied by client/channel volume) or blocking out certain client channels
in favor of others (effectively being able to set other channel volumes to 0).
i agree you need an "audio manager" (like a window manager or composite
manager), but i think it needs to take more of a "control channel sources,
destinations, volumes etc.).

maybe we need both? maybe we need an internal mixer and like xcomposite - the
ability TO redirect if we need to do something bizarre/funky?

> Thanks for your comments!
> Best regards
> Helge Bahmann
> -- 
> Mathematicians stand on each other's shoulders while computer scientists
> stand on each other's toes.
> -- Richard Hamming
> _______________________________________________
> xorg mailing list
> xorg at

------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at
Tokyo, Japan (東京 日本)

More information about the xorg mailing list