[gst-devel] audio/raw properties

Erik Walthinsen omega at temple-baptist.com
Mon Dec 18 10:19:41 CET 2000

Check out: http://ambisonic.net/mulchaud.html

The above is a link to a proposal from Microsoft [Research, I guess], to
create a better multichannel .wav format.  It might be worth adding
read-write support for this format into the wav code eventually, but the
more pressing point is the way they describe raw audio.  I think it
behooves us to get the properties for audio/raw right from the beginning,
which isn't really what we have now.

1) Sample format:

A distinction is made between sample size and actual sample resolution.  
In most cases this will be 16 bits in 16, but in studio applicatons this
can be 24 bits in 32, or 22 bits in 24.

No mention is made of sign or endianness.  Endianness for non-word sample
sizes is odd to say the least.  The M$ .wav format above obviously uses
little-endian samples, where for a 24-bit sample the first byte holds the
low order and the third byte holds high order bits.  This can't be
universally assumed, since there are lots of people using Macs and other
machines with big-endian chips.  I think we can assume little endian if
not specified, though.

2) Speaker position

The wav spec above assumes that the channels correspond to speakers.  
This is the first problem, sice multichannel audio doesn't necessarily
have anything to do with speakers.  Two cases come to mind: multitrack
"hard-disk" recording and Ambisonic.  Multitrack recording generally means
dumping a direct feed from a given mic or instrument straight to
disk.  Labelling them is potentially rather useful...

Ambisonics is a technique based on rigorous mathematics, developed in the
early 70's in Britain.  It is completely speaker independent, with the
channels containing something similar to mid-side stereo (x = left +
right, y = left - right).  First-order Ambisonics use up to 4 channels,
labelled W, X, Y, and Z.  W is omnidirectional, or mono.  Add Y and you
get mid-side stereo.  Add X and you get front-back 2D surround.  Add Z and
you get 'height'.  You can take any of W, WY, WXY, or WXYZ from a
recording and decode them.  WXYZ needs a cubical arrangement of 8 speaker
channels for optimal playout.

Second-order Ambisonics, or Furse-Malham Higher Order (FMH) adds 5 more
channels, R, S, T, U, and V.  Maximal playout config I've seen so far is a
dodecahedron, or 12 speaker channels.  Again, varying subsets of these
channels can be used to convolve to final speaker placement.

And of course, final speaker placement may not be in the normal positions.  
Obviously the cubical arrangement of left/center/right, front/back,
top/bottom can't deal with a dodecahedron.  Actual spherical (preferred)
or cartesian coordinates would be needed to properly represent the
speakers, maybe in addition to some kind of string name.

Anyway, what this means is that the method in the above link is rather
naive, assuming both a fixed set of possible channel placements, and no
squabbles about the order of said channels.  Some alternate means of
describing each channel is needed.

One option is to have a property or set of properties for each channel:

  "channel1_position", GST_PROPS_INT (FRONT_LEFT),
  "channel2_position", GST_PROPS_INT (FRONT_RIGHT),

  "channel1_coord_x", GST_PROPS_FLOAT (-0.4472),
  "channel1_coord_y", GST_PROPS_FLOAT (0.5257),
  "channel1_coord_z", GST_PROPS_FLOAT (-0.7236),

  "channel1_label", GST_PROPS_STRING ("Guitar 1"),

But this might be completely out of the scope of properties.  I dunno.  I
tend to think not, especially given the proposed wav format.  The
potential interactions with the caps and autoplug systems need to be
figured out, though.

Anyway, stuff to think about.  Don't have to pin things down just yet,
probably can safely wait till after 0.1.0, but we definitely don't want to
get too far without properly defining the necessary standards.

         Erik Walthinsen <omega at cse.ogi.edu> - Staff Programmer @ OGI
        Quasar project - http://www.cse.ogi.edu/DISC/projects/quasar/
   Video4Linux Two drivers and stuff - http://www.cse.ogi.edu/~omega/v4l2/
       /  \             SEUL: Simple End-User Linux - http://www.seul.org/
      |    | M E G A           Helping Linux become THE choice
      _\  /_                          for the home or office user

More information about the gstreamer-devel mailing list