[gst-devel] about base classes

Sat Mar 5 09:18:21 CET 2005

Hi,

Andy requested some thoughts on base classes after a discussion (..) on
IRC. Here's a shot at starting such a discussion on the mailinglist (so
everyone, even those not online on IRC, can read it).

I'm going to discuss audio sink base classes here. Similar thoughts may
apply to audio/video source base classes or video sink base classes. I'm
intending to propose a base class that just works, without magic,
required code or any buts or yets for most common audio systems out
there.

1) functions of an audio sink in 0.9
* clock provider (since we have a count of processes samples).
* clock accepter (we can work with another clock as master).
* synchronization of streams against foreign clocks and non-perfect
streams (cutting off samples from a buffer, dropping buffers, inserting
silence).
* writing actual data to whatever system we're abstracting (OSS, ALSA,
esound, jack, polypaudio, artsd, ...).
* negotiation of a format that the audio output supports.
* opening/closing/setting up the system we're abstracting.
* preroll.

I may be missing stuff, but this should be pretty much what an audiosink
does nowadays. The problem here is that it's too much, and a lot of code
is shared between different implementations (e.g. preroll, a/v sync),
which is why we proposed and agreed on using base classes.

2) ase class scope
Here's where the rambling starts. :). What should an audio base class
do? I've sat down to think about this, seeing what else is out there,
how different systems handle this, and came up with this:

2a - general sink base class
 * preroll
 Why? Because it's the same for video and audio. Wim already
 implemented this in his -threaded branch.

2b - audio sink base class
 * a/v sync, clock providing, clock accepting.
 Why? Because those are simple systems that can easily be shared
 in a single base class without being specific towards one
 implementation. It provides a write virtual function through
 which it writes data to the specific implementation.

2c - audio sink implementation
 * format negotiation, backend setup.
 Because this cannot easily be generalized without generalizing
 too much, or we don't gain anything by generalizing it.

3) Example base class design
I've written an audio base class example [1] which is actually
format-agnostic. It acts based on timestamps and durations set on
buffers (for rounding, see [2]). It will calculate differences of
current-time and expected-time and insert silence or drop samples (both
through virtual functions, see [3]) and write all resulting data through
a write virtual function. There's default implementations for the first
two virtual functions, which will keep sync, but may give a small hickup
on an asynchronity (see [4]). I consider that acceptable, given that
implementing one virtual function can be one single line of code (again,
see [1], it implements it for osssink).
Since we keep track of durations, we can provide a clock. The fourth
(and last) virtual function that implementations can implement is to get
the 'delay', i.e. the amount of data that was written but not yet
played. The default implementation returns 0. Through this, the provided
clock can always present the exact time.
My patch-provided base class does not derive from basesink yet, because
it didn't exist at the time when I wrote the patch, but it's intended to
derive from it (which is why any sort of preroll handling is missing).
I'll do that sometime later.
The result of this base class is that if we are clock provider, we are
always in sync. If we are clock receiver and all virtual functions are
(correctly) implemented, we are also always in sync. If we are clock
receiver and only the write virtual function is implemented, then we may
be the duration of the buffer out-of-sync. I think this is an acceptable
solution.

4) Implementations
I've implementes osssink ([1]). The hard part (and largest part of the
patch) is to refactor the osselement base class out of osssink while not
losing any functionality (mixer/device handling shared with other oss
elements). The actual implementation is tiny and simple. Esdsink would
be less than an hour to implement using this baseclass, piece of cake.
Alsasink would need the same refactoring as osssink; I'm willing to do
the work required to get that going (since I myself use ALSA, in the
end). Artsdsink or polypaudiosink would be dead easy, too. The nice
thing is that even though the implementations are dead easy
(essentially, they only need to implement one virtual function, the
'write' one), they will still keep perfect sync, because the base class
does all the hard work there.
I don't know how difficult jacksink would be. It probably wouldn't fit
in this model very well, since jack wants to drive the pipeline rather
than be driven.

That's about all. I hope this explains an example design, I hope people
have nice ideas on how to improve the above or comments on mistakes I've
made. Alternative implementations are welcome, too.

Cheers,

Ronald

[1] http://ronald.bitfreak.net/priv/audiosink.patch
[2] Now, this will indeed give rounding errors, but this will only
become noticeable after a few weeks of use under normal conditions (and
even require several hours (up to 24) of use before it shows in
pro-audio circumstances). That's a design decision, it can be changed if
there's a strong opposition. I can explain the math behind this if
anyone cares.
[3] OK, so if we're format-agnostic, then that means we don't know how
to generate silent samples or how to cut off samples. Therefore, there's
virtual functions for both. The cut-off always returns 0 (i.e. "don't
cut off anything"), and the insert-silence always returns NULL (i.e. "no
silence"). See [4] for the effects on a/v sync.
[4] On an asynchronity, we will either drop samples or insert silence.
If we drop samples, the default implementation will drop zero samples.
this means that even though the clock advances only half a buffer, we
play the full buffer (to keep sync). If we are the clock provider, this
means that other streams waiting for us will halt for a short while
because the clock-advance takes longer than realtime. I.e., there will
be a small 'halt' in playback. If we are clock receiver, then we'll be
out-of-sync until we drop a full buffer (actually, I may not have
implemented the drop-full-buffer part yet; FIXME). If we insert silence,
the default implementation will do nothing. this means that the clock
advances, but in no-time (since we play nothing), so other streams (if
we're clock provider) may skip a frame here. If we're clock receiver, we
could g_usleep() here, but that's currently unimplemented (i.e.: FIXME).
Obviously, if the virtual functions are implemented, all of this does
not apply and it just works.
-- 
Ronald S. Bultje <rbultje at ronald.bitfreak.net>