[gst-devel] 0.9 proposals

Thu Dec 2 06:56:07 CET 2004

Hi all,

Some random comments on the discussion:

On Wed, 2004-12-01 at 19:37, Wim Taymans wrote:
> performance loss is to be expected for reasons such as:
> 
> [snip]
> b) writing wrappers (probably with threads) for the (few? buggy?)
> blocking libs out there. Writing non-blocking means adding stuff to poll
> calls.

Ronald's point is specially important here. Many actual drivers don't
properly support poll calls. That is, theoretically, a poll-based,
non-blocking mode is possible. In practice, it may not be.

> [snip]
> d) each small operation goes through the scheduler again, even if it is
> not required. This is needed because you cannot preempt a running method
> but might want to switch tasks at each basic operation.
> e) saving state as opposed to keeping it on the stack might be awkward
> and suboptimal for demuxer elements. 

Not only for demux elements. Most complex elements would be a PITA to
program using this model. I don't doubt Benjamin can make any element
you throw at him work using this model, but I don't think most people
will have the skills or, for the sake of it, the time and patience
necessary to write elements this way. Additionally, the point of having
a framework like GStreamer is to make things easier, not harder. Having
to split elements this way in order to make them work seems to
completely defeat Gstreamer's purpose.

> f) lots of interaction with and activity in one central place might turn
> it into a bottleneck. Not sure if that is what you experience. Also
> consider the GUI mainloop that might block the gst mainloop, causing it
> to skip or hang. Due to b) the poll might only be checked after some
> delay, when the mainloop gets control again.

This is right. For interactive applications, you'll need threads anyway.

> [snip]
> d) Data passing does not go through a scheduler, the push/chain,
> pull/get functions can be linked directly, achieving higher performance.
> The only problem is that when linking chain to loop based elements,
one
> needs to cross a thread boundary, introducing latencies.

What does "directly" mean in this context? A good GStreamer scheduler
can do the same. This is the sort of optimization I'm planning to
introduce in the fair scheduler. The situations requiring crossing a
thread boundary in your approach are exactly the same situations
requiring crossing a cothread boundary in the fair scheduler. And you
may be able to avoid the extra latency, because you are in control.

> e) Since preemption points are everywhere you don't need to explicitly
> place them, making elements (mainly push/pull loops) clearer, code wise.
> f) Data passing is happening freely without interaction from a central
> entity. Lock contention goes through the kernel.

This is not really accurate. There is a central entity, namely, the
kernel. That the kernel offers such a good abstraction that you have the
impression that there's no kernel, shouldn't deceive you. Every time two
threads synchronize you have to switch back to the kernel, which in turn
gives control to the next thread. Those are operations your processor
will be performing anyway. And the fact that preemption points are all
around is not necessarily an advantage. It implies you have to do
locking, and locking isn't always cheap.

The actual question is whether (or, rather, in which cases) the kernel
does a better job than a good cothreads implementation (like Pth, for
example). Don't take me wrong, I don't know the answer. But I'd rather
not try to answer before testing both approaches carefully.

M. S.