[gst-devel] Mixer plugin

Fri Mar 16 20:41:48 CET 2001

On Fri, 16 Mar 2001, Simon Per Soren Kagedal wrote:

> I'm writing a mixer plugin for GStreamer,
Very cool. ;-)

> Basic Idea: A filter that takes m input audio streams (sinks) and
> mixes to n sources, by adding together the data.  For simplicity, I'll
> assume here that all streams are mono audio/raw data of the same
> format.  Then, to mix stereo data, one would have to use channel
> splitters/mergers.  It would be convinient though to let all in/out
> streams have any number of channels, and probably more efficient.
As a first pass, a mono mixer is perfect, since that's what a lot of
cases are.  I'd write the stereo mixer separately, though as I'll mention
below, they can be merged pretty easily.

> Problems/Issues:
>
> 1. Setting "Panning" Values
>
> For each sink i=1..m, you should be able to set (and get) how much of
> it's signal should be sent to each source j=1..n.
>
> So how do I do this?  gtk_object_set/get is the only interface I know
> of for setting properties of elements in a plugin, and as far as I
> know, there is no way to access "subproperties" except by registering
> one argument type for every g_strdup_printf("mix%d%d",i,j).  This idea
> doesn't appeal to me very much.  I would like both the sinks and the
> sources to be "hotpluggable", and then you would have to add and
> remove lots of argument types when streams come and go.
This is what I was thinking, but I agree that it's not very pretty.

> So..  any other ideas or is this the way I have to go?
One option would be to export a fixed-size 2d array, but any time you
add/remove channels you have to force the application to get the new
array, and always deal with it correctly.  This could be a significant
synchronization headache.

Another option, esp once we move to GObject, would be to export a
GInterface that handles this, which would have a function like:

gst_mixer_set_param(mixer,3,2,0.7);

to set input 3's mix into output 2 to 70%.  The problem with anything of
this form is that we get back into the problem of not having a fully
abstracted interface for all plugins.  To use this interface you have to
have a header specific to the interface exported and included in your
application.  This is probably not such a bad thing for specific cases.

However, I would then provide both the arg-based interface and the
GstMixer interface, so simpler applications can simply not use the
custom interface.

> One twist is that you might would want to like a volume change to be
> described as time-stamped sample accurate event.
Heh, that's a whole 'nother mess.  I have no background in things like
VST, so I'm not sure how they do it, but my intuition is that it would be
done on a per-element basis.  Events would change named values (back to
that arg system...) and elements would take these events via some
mechanism.

An event would consist of one of two things: a timed set, or an envelope.
A timed set would be registered with an element, and via some internnal
mechanism it would apply the set at the appropriate point mid-buffer.
This can be done without element support by preceeding the element with
one that understands the media and timestamps, and will slice'n'dice
buffers on the given event boundaries, setting the following element's
parameters between buffers.  A follow-on element might re-assemble them
into the original buffer granularity if necessary.

An envelope-aware element would be handed an envelope, and for each buffer
it gets, it uses a utility function to return an array of values at the
appropriate granularity.  For instance, it gets a buffer of X samples from
time T to time S.  It would call:

vals = gst_envelope_interpolate(evn,t,s,x);

This would return the array vals of size X, say gint32's.  As the element
steps through and processes the samples, it would use this LUT to get the
value for each sample.  Of course, this is a non-trivial overhead, and
thus should not be used in realtime operations, but then again realtime
operations will be dealing with smaller buffers (~1ms), and you can't
generally notice 1ms stepping when you only hear the result once (i.e.
live).

> 2. Loop Based
>
> The mixer uses a loopfunc() to do its work.  I could not think of or
> make work any other way.  Correct me if I'm wrong, but it seems to me
> that a loopfunc-based implentation should only be used if neccessary?
> For example, I get constant audio glitches if I use several loopbased
> elements in one pipeline, such as:
>
> gstreamer-launch disksrc location=startup.raw ! identity loop_based=t !
> identity loop_based=t ! audiosink
>
> Is this is a bug, is there a workaround, or is this case just stupid?
Technically that's a degenerate case, because there's no decoupling
between the disk and the sound card.  Any time there's any kind of
system-wide load (esp disk activity), the sound card's buffer will run
dry, and you're hosed.  What you really want is:

gstreamer-launch disksrc location=startup.wav ! identity ! identity !
queue ! { audiosink }

That will allow the disk to run well ahead of the sound card, filling up
that queue.  If the system switches out the main thread for any length of
time, the sound card is supplied from the queue.

> (yeah, I know that's an ugly hack for setting the boolean value, but
> it works. :) any chance gst_parse_launch will recognize other types
> than strings soon?)
Yeah, it should be added.  Shouldn't be too hard, you just ask for the
details of the argument, and based on the type it claims to be you do the
approriate translation attempt.  gst/gstparse.c is the culprit, if you
want to give it a shot.

> So is there any way I can do it without a loop function?  Can I have a
> chain function that only pushes when the data from all the sinks has
> been received?  Or could I define some sort of "pull" function on the
> sinks that would in turn pull it's data from the sources?  I tried
> using a get function but that didn't work.  (for this approach, you
> can't have n source streams, but that doesn't really matter that much,
> it could just always be one src with n channels, and then you'd use
> some channel splitter afterwards if neccessary.)
The basic problem with a chain function is that it really can't work
effectively in a multi-in scenario.  This is because the order and pattern
in which it receives the _chain() calls (when the upstream peer pushes a
buffer) is not under its control.  If it needs to get one of every input,
even if it's in any order, it's at the mercy of the other elements in the
pipeline.  Things get even worse if you have queues preceeding some of the
inputs.

A loop function solves this by putting the element in control of this
scheduling.  It simply asks for data on each of its input pads.
Scheduling works out nicely such that as long you don't have any deadlock
potentials (places where a buffer might be pushed without the previous
buffer having been removed from the pen first, solvable with a small
purpose-built queue) everything works very nicely.

The overhead of using loop functions is not significant, and in fact right
now *all* elements are scheduled as loop functions, even if they aren't.
A wrapper repeatedly calls the _chain() function, doing a gst_pad_pull up
front.  This makes scheduling quite a bit simpler for the moment, and once
the whole scheduling system settles down again, we can do more work at
implementing the various optimizations involved in using groups of chain
functions together with loop-based elements.

> If there is anywhere an explanation (except the source code :) ) of
> how exactly the scheduling works, I would be very interested.
You should check out the latest set of slides on the website.  There are a
couple slides with significant notes that should help you figure out how
it works.

> Another question: the manual says "chain based elements receive a
> buffer of data and are supposed to handle the data and perform a
> gst_pad_push".  What does _A_ gst_pad_push mean here?  Not
> neccessarily 1, since GstTee pushes to all it's outsrces.  But can it
> be zero?
Yes, it can be zero.  If your chain function gets called twice for every
gst_pad_push, so be it.

> 3. Channels
>
> For maximum flexibility, I would like to let the user have any number
> of channels of any of the sinks and srces.  For sinks it is easy, the
> channel number comes at meta info on the stream.  Some oddities
> though..  User wants to set mixing values for a specific sink before
> the network is running, but then the mixer doesn't know how many
> channels there are going to be on that sink, so how to store/verify
> the value...?  Anyway, I don't know how to set these properties, see
> problem 1.  Sorry, I'm mumbling, but what it all boils down to is...
First the problem of how to deal with mixed input channel counts:  I'd
suggest internally treating each channel as its own input, so you have
"srcNcM" as the name for non-mono inputs.  You may make the simplifying
assumption that all inputs are labeled that way, even mono, so apps don't
have to special-case mono inputs.  You just have lots of "srcNc0"'s.

Next, the number of input channels should be specified in the pad caps.
The problem, as you point out, is that these caps may not be fully
specified up front.  You'll have to do some kind of 'partial play' in
order for these caps to be settled down.  This isn't fully understood
right now, but the basic idea is that you'd hit play on just the elements
you need to get the caps settled down, then pause them.  We need to think
more about this from all angles, and have a couple of complex usage cases
to think through.  If you can sketch a pipeline where this is a problem
(say a mp3 decoder into the mixer, where it isn't known whether it's mono
or stereo), that could help a lot.

Then again, I can't think of a case offhand where this would be a problem.
The mp3 example shows why:

A disksrc would be attached to a typefind element by autoplug.  This
typefind would be intelligent enough to not only determine that it's a
audio/mp3 file, but find all the basic parameters (layer, bitrate, channel
count, etc.).  When you attach the mp3parse element, it simply converts
from framed=false to framed=true.  mpg123 then sees the mp3-specific
properties and can immediately set the outgoing audio properties.

> 4. Philosophy
>
> I've tried to do the mixer using a "hard things possible but easy
> things easy" approach.  To make easy things easy, it should
> automatically set mixing values (if not explicitly specified) so that
> things like the following work as it ought to:
>
>   stereosrc1 = ...
>   stereosrc2 = ...
>   gst_pad_connect (gst_element_get_pad (stereosrc1, "src"),
> 		   gst_element_request_pad_by_name (mixer, "sink%d"));
>   gst_pad_connect (gst_element_get_pad (stereosrc2, "src"),
> 		   gst_element_request_pad_by_name (mixer, "sink%d"));
>   gst_pad_connect (gst_element_request_pad_by_name (mixer, "src%d"),
> 		   gst_element_get_pad (audiosink, "sink"));
>
> But it does kind of complicate things, and I think the #1 philosophy
> should be "do all things efficient".  So is it still a good idea to
> design for the hard cases and try to optimize afterwards, or would be
> better to have like "monomixer" and "stereomixer" etc. in addition to
> a more general mixer?
Actually, there's significant possibility for specialization.  Depending
on the parameters, you can switch between different loop and chain
functions, each specialized for the given set of parameters.  The 'volume'
element in CVS does this to some extent.

      Erik Walthinsen <omega at temple-baptist.com> - System Administrator
        __
       /  \                GStreamer - The only way to stream!
      |    | M E G A        ***** http://gstreamer.net/ *****
      _\  /_