[gst-devel] 0.9 proposals

Wed Dec 1 10:33:03 CET 2004

On Wed, 2004-12-01 at 15:54 +0100, Benjamin Otte wrote:
> On Tue, 30 Nov 2004, Thomas Vander Stichele wrote:
> 

[snip]

> > How do you make up for the performance loss of choosing a non-optimal
> > read/write/select model ?
> >
> Pardon me? I don't see a performance loss and a non-optimal select model.
> In fact the biggest performance loss in my 0.9 code is the usage of
> GMainLoop. I think this might be due to it's heavy use of locking, but I
> haven't investigated it further.

>From looking at your code, which is certainly interesting, I made the
following observations:

a) it uses a non-threading scheduling model
b) to accomplish a) you need non-blocking elements else one element can
block the pipeline and the app forever.
c) you don't use cothreads or any other user-space threading library
d) scheduling arbitrary elements is done by using continuations, each
element has one or more small entry points that do something, save state
and produce a result.
e) observation d) requires elements to be written as a state machine.
f) the tasks are run from the mainloop.

performance loss is to be expected for reasons such as:

a) mostly on SMP for evident reasons.
b) writing wrappers (probably with threads) for the (few? buggy?)
blocking libs out there. Writing non-blocking means adding stuff to poll
calls.
c) see the existing cothread implementations, they can taskswitch in
userspace on selected scheduling points. They are slower though than a
scheduler like opt and even opt has a lot of overhead in bookkeeping.
d) each small operation goes through the scheduler again, even if it is
not required. This is needed because you cannot preempt a running method
but might want to switch tasks at each basic operation.
e) saving state as opposed to keeping it on the stack might be awkward
and suboptimal for demuxer elements. 
f) lots of interaction with and activity in one central place might turn
it into a bottleneck. Not sure if that is what you experience. Also
consider the GUI mainloop that might block the gst mainloop, causing it
to skip or hang. Due to b) the poll might only be checked after some
delay, when the mainloop gets control again.

> 
> > How do you still achieve low-latency if you have to actively loop to
> > check for data processing ? How do you decide on the latency-vs-
> > performance tradeoff ?
> >
> I'm not sure why you think this is so much better with threads. Threads
> are just an uncontrollable and suboptimal way to have the decision what to
> do next done by someone else and freeing you of the burden to decide for
> yourself by making the kernel decide what to schedule next instead of
> doing it yourself.
> I've always wondered why that would be preferrable.

Kernel threads solve the issues mentioned above in the following ways:

a) works on SMP, parts of pipelines, the parts that can be executed in
parallel, like the decoding of audio/video are performed in parallel.
b) you can use blocking elements (which are most common) and let the
kernel unblock. Together with point d) this allows for the currently
running element to be preempted to get the data with lower latencies.
c) Kernel threads, preemptable.
d) Data passing does not go through a scheduler, the push/chain,
pull/get functions can be linked directly, achieving higher performance.
The only problem is that when linking chain to loop based elements, one
needs to cross a thread boundary, introducing latencies.
e) Since preemption points are everywhere you don't need to explicitly
place them, making elements (mainly push/pull loops) clearer, code wise.
f) Data passing is happening freely without interaction from a central
entity. Lock contention goes through the kernel.

Performance problems in threaded apps are mostly caused by lock
contention and user-space lock checking, which usually flushes caches.
In the proposal I mentioned one lock for the streaming threads, where
contention happens on state changes and things like application events,
which are out of the ordinary data flow.

But this is turning into a threads-vs-non-threads discussion now.
Preemption and SMP are the main reasons to use threads, there is no
point in writing code to handle preemption when it is already provided
as a service by the kernel, IMO. And if you do alow threads then we are
back at the beginning, how to handle state changes, signals,
negotiations over queues, etc...

> 
[snip]
> > But it should be possible to have *some* measure of
> > stability and workingness throughout the development.
> >
> I think you'll have a hard time having this. To measure stability and
> workingness, you need to do tests. GStreamer's testing framework is as abd
> as during 0.7 and there's still the same crux with getting app developers
> to switch their app and test it on an unstable branch that is quickly
> evolving.
> So I'm not sure which measurements you think of here.

Personally, I'm thinking about extending the module tests more. I'm
quite happy with them and they would be even better if each module could
be tested separatly against all the features and requirements of a
design plan. This also requires decoupling the objects a little more.

Wim

> 
> Benjamin
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now. 
> http://productguide.itmanagersjournal.com/
> _______________________________________________
> gstreamer-devel mailing list
> gstreamer-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gstreamer-devel
-- 
Wim Taymans <wim at fluendo.com>