[gst-devel] autoplugging and 0.9

Tue Aug 24 05:06:13 CEST 2004

On Fri, 2004-08-20 at 22:18, Erik Walthinsen wrote:
> Ramón García wrote:
> > Sorry, this message is difficult to read without
> > context.
> >  
> > Wingo was against automatic placement of threads by
> > the scheduler. I was supporting it.

This discussing is drifting away from the original intend of the
statement. I meant that instead of writing foo ! { queue ! bar } one
would write foo ! queue ! bar and the scheduler would figure out that
the only way to schedule bar is to decouple the execution from foo.
If you don't use queue, nothing is automatically inserted. 

It does bring up the issue of scheduling an arbitrary pipeline which, I
believe, need to be implemented using threads. How and where the threads
are inserted will be explained later. That is to say: some pipelines are
practically impossible to schedule without the use of threads. If you
don't create such pipelines, fine, no threads will be created
explicitly. If you tell the scheduler never to create such implicit
threads, then it won't. 

> 
> I suppose I should weigh in on this issue, though I haven't seen the 
> arguments either way.

I'll try to explain them here.

> 
> Requiring threads to be placed explicitly by the application was the 
> VERY FIRST design choice made, long before any code was even conceived, 
> or even ANY other design issues were thought about.  This issue is the 
> primary reason GStreamer ever existed in the first place.  Both the OGI 
> Quasar player (which directly inspired GStreamer by both is features and 
> misfeatures, mainly this issue) and DirectShow place every single 
> element in its own thread, and thus queues between them.

Not sure about Quasar but my understanding of DirectShow is now much
better than it used to be and I actually like the scheduling. It does
NOT use threads for each element. 

There are only two possible driving forces in a pipeline:

 - _get based pads: pad with a get functions. These pads are typically
exposed by source elements. 
 - loop based elements: an element with a _loop function. These elements
are typically demuxers, muxers, mixers or any other plugin that requires
more control over their sinkpads.

There are NEVER other driving forces in a pipeline. The proposal is to
ONLY allocate a 'thread' to the driving forces in a pipeline. This is
exactly what DirectShow does and what the opt scheduler (albeit in an
overcomplicated way) does. I mention 'thread' here but would like to
speak about a more general term task (can be kernel thread, cothread,
non preemptible thread ala opt, idle_loop job, ...) 

Erik, This is not equal to allocating a thread for each element,
consider:

 filesrc -> mpegdemux -> mpeg2dec -> colorspace -> ximagesink

suppose mpegdemux is loop based, this pipeline runs in one thread and is
driven by the demuxer, no other loop based elements or getbased pads
require more threads or queues. Since mpeg2dec, colorspace and
ximagesink are connected on a chainbased sinkpad, they just push the
data through in the same task as mpegdemux. 

This is also NOT what we did with the cothread based approach of the
basic scheduler where each element was allocated a cothread.

Now, consider this:

 filesrc -> mpegdemux -> mpeg2dec -> colorspace -> queue -> ximagesink

We added another element in the pipeline (queue) which has a chain based
sinkpad and a get based source pad. This pipeline now has two entry
points, mpegdemux and queue. The idea is to allocate two tasks to
schedule this pipeline, one for driving mpegdemux and another one to
drive the queue.

Again this does not unpredictable introduce latency. After all, you
asked for a decoupled processing by adding the queue element.

Finally, there is one case where automatic decoupled processing is
required: when sink pad without chain function is connected to a srcpad
without a get function. This can only happen when a sinkpad driven by a
loopbased element is not connected to a getbased srcpad.

In this case, when the loopbased element performs a pull on the sinkpad,
it must wait for the other end of the pipeline to provide a buffer. for
example:

  filesrc -> identity -> mpegdemux -> fakesink

two entry points: filesrc (since it is connected to a chain based pad)
and mpegdemux (loop based). A queue of 1 buffer would be inserted by the
scheduler between identity.src and mpegdemux.sink and the two entry
points would be scheduled by one task each.

This can be implemented in different ways:

- cothreads: the pull on mpegdemux triggers a cothread switch to the
peer group to schedule the entry point filesrc.
- task groups: the pull on mpegdemux triggers performs an iteration on
the filesrc entry point (like opt)
- kernel threads: the pull on mpegdemux blocks and the kernel can
schedule the peer group to eventually unblock the pad.

GStreamer currently has schedulers that implement the cothread and the
task group method. I will list positive and negative points about each
possible approach.

cothreads:
+ very finegrained control over when control should be given to another
entry point.
+ very fast, lowlatency switches. 
- finding the entry point is non trivial or impossible, it breaks for
muxers or pad select for example. Heuristics can be made based on
fairness, timestamps, ... but the fundamental problem is that scheduling
an entry point can block and starve the rest of the pipeline.
- implementation limits: fixed stack space and stack location for
elements, scary non portable code to perform the cothread switch, hard
to debug. 

task groups:
+ finegrained control over when control should be given to another entry
point.
+ fast switch
- finding the right entry point is non trivial and prone to block or
starvate a pipeline. same issues as cothreads.
- since a group iterate function has to run to completion before control
can be handed back to the caller (especially for chain->loop
connections), unlimited queues must be inserted between loop->loop and
chain->loop elements to capture all the buffers generated in one
iteration, which adds considerable latency.
- possibility of stack overflow and infinite recursion when switching
between groups.

kernel threads:
+ finding entry points is trivial
+ no need to even think about what to schedule when, everything is
scheduled when it's possible.
+ select can be made to work, other hard scheduling problems because
stuff is blocking are not a problem anymore since the kernel can preempt
and schedule something else at any time.
+ no known implementation limits. 
- no finegrained control over when to schedule a thread, kernel might
choose suboptimal way (can't think of a use case yet)
- higher latency when switching between chain->loop and loop->loop
connections. pull succeeds when kernel decides to do a thread switch.

Now, in an ideal world, cothreads together with kernel threads would be
ideal, and this is what the original design intended to achieve. The
explicit usage of GstThread on decoupled pipeline boundaries can be
debated as those could just as easily be constructed automatically.

Cothreads however simply have too many implementation problems.

Task groups simply miss the preemptivity needed to break out of blocking
paths. They also have very high latency in some situations. They suffer
the same problems as cothreads basically.

Simulated cothreads with kernel threads work very fine combined with a
scheduler like opt or entry to limit the number of threads and therefore
the latency. The problem however is that only one thread can run at a
time (cothread) and that therefore parts of a pipeline tend to starvate.
They suffer the same problems as cothreads and task groups do.

Using kernel threads for each entry point in the pipeline would result
in a slightly different idea of scheduling: Providers produce as much as
they can in parallel and consumers simply consume. The parallelism here
is the new feature.

Just a quick wrapup of my line of thinking, not sure I gave enough
information or if I need to provide use cases that break with a strictly
non threaded approach.

> 
> The fundamental problem with doing this is that there is no control over 
> how consistently any given thread is scheduled by the kernel, and 
> therefore all queues have to be oversized.  Every element added to a 
> pipeline (a DVD pipeline has ~6-8 elements between the disk and the 
> user) adds another essentially uncontrolled queue, which because of 
> various constants in the kernel (100Hz jiffies being a primary factor, 
> the design of the scheduler being another), requires that queues be far 
> more than one 33ms frame-time each.  With many elements, total queue 
> length typically exceeds 1 second.

As stated above DVD decoding does not require any queues at all unless
the obvious ones to decouple audio/video display (and even that is not
required when using the async clock notifications).

> 
> Applications that require low latency and low jitter (any live 
> processing of audio, video, etc.) fundamentally cannot be implemented 
> with a system that has implicit queues.  Strict control over latency is 
> only possible with pipelines engineered from the ground up to use no 
> queues, passing all data immediately from element to element.

I understand the issue and I'm going to state that given the right set
of elements, queues/threads are not going to be insterted at all. Also
note that an element can select a loop, a chain or a get function based
on how it connects to the peer element. Also note that loop based
elements typically require a decoupled processing when not connected to
an 'obeying' element.
Also note that the implicit queue would only have 1 buffer, just to pass
to the other thread after a thread switch. 

> 
> 
> However, the design of GStreamer's scheduler subsystem means that there 
> is nothing keeping someone from creating a scheduler that does in fact 
> implement implicit threads and queues between all elements.  As a matter 
> of fact I think such a scheduler is long overdue, because it can be 
> constructed in such a way as to "guarantee" proper execution of 
> pipelines in ways that the normal scheduler has yet to achieve.  As such 
> it can act as a reference scheduler, where certain performance 
> characteristics like latency are irrelevant, but cannot be the default 
> by any means whatsoever.
> 

Regards,
Wim

> 
> -------------------------------------------------------
> SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
> 100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
> Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
> http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
> _______________________________________________
> gstreamer-devel mailing list
> gstreamer-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gstreamer-devel