[gst-devel] totem and osssink? (long)

Martin Soto soto at informatik.uni-kl.de
Wed Mar 10 15:15:02 CET 2004


Hello Ronald (and everyone):

On Wed, 2004-03-10 at 00:04, Ronald S. Bultje wrote:
[description of Totem's sync problems deleted]
> * Probably the main issue: I don't know how clocking works. We have
> several time units in osssink, one being the element clock, one being
> the audio clock, one being the oss clock, one being the buffer
> timestamps, one being the elementtime and one being the buffertime (see
> chain()). I have no f***ing clue how those relate to each other or what
> each of those represents, and I cannot fix osssink if nobody tells me. I
> need documentation. Benjamin, please. Explain what you did here, what's
> what. Especially audio clock vs. oss clock vs. element clock and which
> does what. And how - according to you - A/V sync and timing should be
> done, for the element itself, other elements and applications. The code
> does *not* speak for itself.

I've been hacking the stuff heavily in the last days, so I thought I may
try to explain what I understand of it at the moment. I'm now
synchronizing a video element for the DXR3 card with an audio element
for SPDIF output based on ALSA and an SB Live! sound card. It works well
and it is robust. I can jump DVD chapters back and forth and go into the
menus as many times as I want without ever loosing synchronization
(although there is a problem, see below).

So, here I go. Text inside square brackets corresponds to personal
opinions or things I don't know for sure. The remaining text correspond
to things I'm pretty sure about (but they may be wrong anyway). It would
be good that at least Benjamin would take a look at this explanation and
point out any problems. This would help us all get a more clear view of
what's going on with time handling.

Time Values
-----------

All times in GStreamer are represented as integer values in
nanoseconds (1/10^9 seconds). Type GstClockTime, which is consistently
used to store time values, is a 64 bit unsigned integer (guint64). The
maximum time you can express with such a value is almost 585 years,
which should be enough for multimedia purposes. Type
GstClockTimeDiff, which is used to store time differences, is a signed
64 bit integer (gint64). It can go from about -292 years to about 292
years, which also seems sufficient for our purposes.

For the examples, I will write times in seconds, which are easier to
read and think off, instead of nanoseconds.


Clocks
------

Clocks are objects of the GstClock class. Their purpose is to provide
a real time reference. The function gst_clock_get_time allows you to
consult the current time of a clock. The only thing you know for sure
about such a value is that it never decreases, that is, if you call
gst_clock_get_time twice, you can expect the second result to be
greater than or equal to the first result. In general, you can expect
a clock to progress in real time (as long as it is active, of course),
but, in practice, that's not always the case, as we will see.

The GStreamer core provides a default clock, that is based on the time
services offered by the underlying operating system. Elements can also
provide their own clocks, usually based on some hardware clock, like
that present in a standard sound card. Since all clock objects are
supposed to reflect real time, it shouldn't be important which clock
you select for a particular application. In practice however, the
choice of clock may have a notable effect on the behavior of a
pipeline.


Element Clocks
--------------

Elements may provide a clock, and they may require a clock. Whenever a
pipeline is created, the core will automatically select a clock and
distribute it to elements requiring one (if there are any) by invoking
their set_clock function. Usually, if one element in the pipeline
provides a clock, it will be selected. Otherwise, the default clock
will be used.

Properly programmed elements should be able to use whatever clock they
receive. Even elements providing a clock should not count on being
assigned their own clock. In case they are assigned a different clock,
they should use it (and not their own) for synchronization. [I think
not all sinks respect this rule. They should if we want to achieve real
interoperation.]


Element Time
------------

Clocks provide a real time reference, but this reference doesn't have
any defined base. That means, if you read a clock now and it returns,
for example, the value 250s, that tells you nothing about the actual
current time (i. e. that won't tell you if it is one o'clock or
3:30). If, on the other hand, you read the clock later and it returns
255s, you can tell 5 real seconds where elapsed since your initial
read (provided the clock wasn't stopped in between). In other words,
clocks are useful to measure time lapses, but they don't help when you
have to do something at a particular, externally defined time (like at
12 o'clock).

In order to make things a bit easier to program, and since clocks have
arbitrary base times anyway, elements provide a way to change their
particular base time. Function gst_element_set_time is used for this
purpose. So if you say now

  gst_element_set_time (elem, 100 * GST_SECOND)

and 10 seconds later you execute

  time = gst_element_get_time (elem);

the value of time will be 110s (i. e. 110 * GST_SECOND).

This is achieved without touching the clock object assigned to the
element. Elements contain a field, called base_time, that will be
subtracted from the actual clock time in order to calculate the
element time. gst_element_set_time just adjusts base_time properly to
achieve this behavior.


Synchronization
---------------

In order for two or more elements to play synchronized, you need them
to have a consistent notion of time. Not only it is necessary that
their clocks run at the same speed (this is of course achieved by
distributing them the same clock object) but you need them to have the
same base time. As soon as they all have a consistent base time, all
you need to do is tell them to play corresponding material at the same
time.

Discontinuous ("discont") events are used for this purpose. Discont
events contain a time value. The typical handler for such an event (at
least in sink elements) looks like this:

  case GST_EVENT_DISCONTINUOUS:
    {
      GstClockTime time;
	      
      if (gst_event_discont_get_value (event, GST_FORMAT_TIME, &time)) {
        gst_element_set_time (GST_ELEMENT (sink), time);
      } 
    }

This means, in principle, all you need to do is send a discont event,
in order for your sinks to have a consistent time base.

[As far as I understand it, it is not possible at all for two elements
to synchronize if they don't receive a proper discont event. I thing
most source elements don't send a discont at start, and that may be a
cause for programs not working anymore after Benjamin's last changes.]


Timestamps
----------

Timestamps are time values stored in buffers. They are accessible
through the GST_BUFFER_TIMESTAMP macro. The timestamp in a buffer
tells the time at which the material in the buffer should start to
play.  [Is this true? I always use the convention that timestamps are
associated to the start of the buffer, but I haven't seen it written
anywhere.] The length of time the material should play is, on the
other hand, rather determined by the characteristics of the stream
(like, for example, a PAL video frame should play for 1/25th of a
second). Not all buffers have to contain a timestamp. When there are
no timestamps, the element should keep playing in sequence until a new
timestamp arrives.

The time base for the timestamps is usually arbitrary and determined
by the media stream being played. In order for the sink elements to
know how to properly interpret timestamps in a given media stream,
their base time must be set based on the stream itself. For example,
in order to play a video clip with a duration of 30 seconds, which is
timestamped from 380s to 410s, the source element has to send a
discont event with time 380s before sending the contents of the clip.
That way, both the audio and video sinks will set their element times
to 380s, and will start playing immediately as the first data buffers
arrive.


Playing on Time
---------------

Given how things are set up, it is (at least conceptually) simple for
a sink to keep playing synchronously:

  - When receiving a discont event, the sink should set its element
    time based on it, as shown above.

  - When receiving a data buffer, the sink must consider three cases:

    1. The timestamp is equal to (or at least near enough) the current
       element time. In this case the sink should play the material
       right away.

    2. The timestamp is bigger (later) than the current element
       time. The element should wait until its own element time
       reaches the timestamp, before playing the material. The
       function gst_element_wait is intended for this purpose.

    3. The timestamp is smaller (earlier) then the current element
       time. The material arrived too late. A certain amount of
       material must be skipped (it need not be the whole buffer, or
       it may be more than one buffer).

It is important to emphasize that all sinks respecting this rules will
play synchronously, as long as they are fed proper discont events and
correctly timestamped material.


Audio Clocks
------------

Some elements can follow the rules above easier than others. The first
rule, for example, can almost never be followed exactly. When a buffer
with a new timestamp arrives, it is almost impossible that it matches
the current time exactly to the nanosecond. So you have to allow for a
small error range. Video sinks, for example, can set such an error
range to a relatively high value. PAL frames play every 40
milliseconds, so allowing for an error of 10 to 15ms works quite
ok. Additionally, a video frame can be skipped or played somewhat
longer without seriously affecting the playback quality.

Sound, on the other hand is much more sensitive. Skips of just a few
milliseconds are immediately perceptible as clicks in the sound. For
that reason, you should avoid waits and skips as much as possible with
sound output elements.

This is however difficult when your time reference is different form
that used by the sound card. Sound card clocks tend to be quite
imprecise, and computer clocks (of the kind present in a standard PC
motherboard) aren't also specially good. The result is that even after
only a few minutes (or even a few seconds) of playback, you will start
observing differences between the sound card's clock and the reference
clock, differences that you'll need to correct through waiting and/or
skipping.

The simplest solution to this problem is using the card's clock as
reference clock. The current GStreamer method to select the default
clock, does usually exactly that, because audio sinks are normally the
only ones providing clocks. For a typical video playing pipeline, with
an audio and a video sink, that clock provided by the audio sink will
be selected and distributed to the whole pipeline, including the video
sink.

Now, implementing a GstClock based on a sound card output is not that
difficult. The usual approach is to keep a running count of the number
of samples written to the card (you update it every time you write any
data).  If you divide that by the sampling rate, you basically obtain
the playback time since you started writing to the device. Except that
material written to the sound interface doesn't play immediately,
because there's usually a hardware buffer. In order to obtain the
exact playback time, you need to subtract the amount of material
currently waiting in the hardware buffer. This amount can be obtained,
for instance, using the ODELAY ioctl in OSS, or the snd_pcm_delay
function in ALSA.


Making Sure Discont Events Really Match
---------------------------------------

[Or: How Things May Not Always Work as Expected]

Attentive readers may have observed that there's a problem with the
way discont events work. As stated, all you need for two or more
elements to have the same time base, is to send them discont events
with the same value. However, what an element does when receiving a
discont is setting its own element time to the time in the event,
i. e., the element states that *the current time* corresponds to the
time stored in the event. 

If discont events where propagated instantaneously down the pipeline,
or they where at least guaranteed to arrive at exactly the same time
to all destination elements, things would actually work as
described. In practice, however, you cannot guarantee that. Discont
events get trapped in the normal pipeline data flow, which means they
get delayed in queues and processing elements. The result is that they
usually arrive to the various destination elements at slightly
different times. This would imply that the various sink elements would
end up with small differences in their base times, which would result
in a small (but probably very annoying) lack of synchrony.

Our current solution [which is actually a very clever hack from
Benjamin, don't take me wrong here] works in sort of a "snap to grid"
fashion. GstClock objects provide a gst_clock_get_event_time
function. The value of gst_clock_get_event_time is usually identical
to the value of gst_clock_get_time, i. e. it is the current clock
time. However, if you invoke gst_clock_get_event_time twice in a short
interval (how short is determined by the max-diff property in the
clock object, whose default value is 2 seconds) you receive exactly
the same value, namely, the time of the first invocation.

To illustrate, let's say we perform the following invocations:

  /* At clock time 25s: */
  rt1 = gst_clock_get_time (clock);
  et1 = gst_clock_get_event_time (clock);

  /* At clock time 26s: */
  rt2 = gst_clock_get_time (clock);
  et2 = gst_clock_get_event_time (clock);

  /* At clock time 30s: */
  rt3 = gst_clock_get_time (clock);
  et3 = gst_clock_get_event_time (clock);

The final values of the variables would be:

  rt1: 25s
  et1: 25s

  rt2: 26s
  et2: 25s (!!)

  rt3: 30s
  et3: 30s

As seen in the example, the second invocation of
gst_clock_get_event_time "snaps" to the time of the first one. On the
other hand, if you wait long enough (more than 2 second by default)
you get the real clock time once again.

How does this help with discont events and synchrony? Actually,
gst_element_set_time uses the value of gst_clock_get_event_time to set
the element's base time. The practical result is that if many elements
sharing a clock call gst_element_set_time inside a short enough time
interval, their base times will be set to exactly the same value. This
means they synchronize perfectly (at least as soon as the roughness
caused by the discontinuity settles down).

This tends to work well, because even if a discont event arrives at
different times to different elements, the difference is usually small
enough for the mechanism described above to be triggered. There is,
however, one case where this doesn't work properly, namely when two
discont events are sent by the source element during a short time
interval. When this happens, the results are unpredictable, since they
depend on the exact order the events have when arriving to the sinks.

[Unfortunately the situation above is common in interactive
pipelines. I (like many other people, I guess) have the tendency to
move around in films by repeatedly pressing the "chapter back" and
"chapter forward" buttons, until I find the desired scene. As soon as
you do that you end up sending discontinuities down the pipeline in
very short time intervals. Although my player now handles disconts
quite ok, every now and then I end up with very bad (> 2sec) lack of
synchrony while jumping around chapters.

It is very difficult to work around this problem in a satisfactory
way. The only reliable solution I can think of would be identifying
every discont event uniquely (with a serial number, for example), and
having a separate event time in the clock for each discont. Of
course, older event times can be discarded after some time, so you
wouldn't have any issues with memory usage. Xine does something like
this as well.]

There is a second problem, related to material accumulated in hardware
output buffers. This problem doesn't lead to lack of synchrony, but
may cause very rough playback after a discont. I'll explain that in a
later message.

Cheers,

M. S.
-- 
Martin Soto <soto at informatik.uni-kl.de>





More information about the gstreamer-devel mailing list