[gst-devel] 0.9 proposals

Thu Dec 2 06:41:01 CET 2004

Hi everyone!

As you probably know, I've been dealing with the scheduling problem in
GStreamer for quite some time know. So I guess I have one thing or two
to say about it.

As far as I can tell, Wim is proposing a completely thread based model
(this is AFAIK, the DirectMedia solution). Benjamin, on the other hand,
is advocating (and implementing!) a completely threadless,
whole-pipeline-runs-in-a-single-thread solution.

What my experience so far tells me, is that an intermediate solution may
be the answer. This solution would use a mixture of threads and
cothreads to achieve an optimum, depending on a number of factors, like
whether you are in an SMP machine, how efficient the available thread
and cothread implementations are (absolutely and with respect to each
other), which latency requirements you have, and, even, whether the
particular elements in your pipeline can or cannot avoid blocking.

A first comment: I know that speaking of a mixture of threads and
cothreads makes some people here shiver. This combination used to cause
lots of problems before, at least under Linux. Well, I wrote a Pth
implementation of GStreamer's cothread API, and I'm happily using it
right now with Seamless (which is certainly threaded). The trick is to
use the New Posix Threads Library (NPTL) at the same time. The old
Linuxthreads used to get horribly confused by cothread context switches,
whereas NPTL has no problem at all. I haven't commited that code to CVS
because I don't know the Automake spells necessary to introduce and
optional dependency on GNU Pth, but if someone gives me a hand, I'll be
glad to.

And by the way, some (admittedly pretty non-scientific) tests I have
conducted, showed me that the fair scheduler, running with Pth
cothreads, works about as fast as opt when running simple "fakesrc !
fakesink" or "filesrc ! filesink" style pipelines. And the fair
scheduler is a new, quite experimental piece of code, with lots of room
for optimization. I know, for example, that number of context switches
are avoidable, and plan to introduce the required optimizations soon.

But back to my point: the fair scheduler works more or less like an
operating system scheduler. It creates cothreads for some of the
elements (actually all of them in the current CVS implementation, but
this will change soon) puts all those cothreads in a queue, and then
keeps getting a cothread out of the queue and giving it control until it
switches back.

Cothreads that are waiting for something, are taken out of the run queue
(i.e. they go to sleep). When the corresponding event happens, they are
put back in the queue (i.e. they are awaken). A typical situation is a
cothread trying to pull from a pad. If there's no buffer in the (single
buffer) "bufpen", it just gets out of the run queue and switches back to
the scheduler, but not before registering itself in the pad as "waiting
reader". As soon as the peer element writes, it awakens the waiting
reader by putting it back into the queue. The normal scheduler loop
takes care of the rest.

Now, the nice part of this model is that it is not only relatively
simple, but quite compatible with the standard kernel threads model. I
would say a lot of the scheduler's code would remain almost identical if
we were to change it to a pure threads model. But, most important, by
defining the right abstractions, we would be able to mix and match. Some
benefits I see in this approach:

- Properly implemented cothreads can be a bless when dealing with
latency. In Linux, for example, an SCHED_FIFO thread running cothreads
has guaranteed switching times, and they can be made pretty short.
Additionally, the GStreamer model makes it possible to implement data
passing between cothreads without requiring locking at all. This alone
may help a lot in terms of performance, because, AFAIK, every time you
lock or unlock a mutex, or wait on, or signal a condition variable, you
perform at least one system call.

- The drawback of cothreads is that blocking operations block the thread
(and all of its cothreads) anyway. However, elements that may block
because of I/O or whatever, could be marked with a special flag. A
scheduler may choose to put them in separate threads. Even more, it'd be
possible to have a thread pool to run them on-demand. I though of doing
that for the fair scheduler, but right now I can only use (unreliable)
heuristics to tell which elements may block.

- In SMP systems, you could have as many scheduler threads as desired,
that would take work from the run queue as it becomes available. This
would be easy to add to the current fair scheduler indeed.

- Incorporating an event based model wouldn't be that hard either. The
current fair scheduler implements clock waits by putting waiting
cothreads in a separate, time prioritized queue, and pulling them out
when the time has come. We could have queues for different types of
events (i.e. disk I/O events, thread events) and use the main loop or
whatever to pool cothreads out when events happen.

Now, to make all of this possible, we'll need to make some improvements
to the current scheduler API. The most important one, would be to add a
hook to lock and unlock pads. That is, the core wouldn't implement locks
by directly calling GLib, but would ask the scheduler to lock the pad.
Arbitrary locking policies can be implemented in the scheduler this way.

The other necessary change is to add a way for elements to tell the
scheduler that they may block. If we agree on the details of these to
changes, we could have a completely thread based scheduler, and a mixed
thread/cothread based scheduler real soon.

Cheers,

M. S.