[Cogl] [PATCH 3/3] Add CoglFrameTimings

Thu Jan 24 11:59:10 PST 2013

On Mon, Jan 21, 2013 at 10:06 PM, Owen Taylor <otaylor at redhat.com> wrote:
> On Fri, 2013-01-11 at 16:36 +0000, Robert Bragg wrote:
>> Ok, lets try and pick this up again now that we're all back from holiday...
>
> [...]
>
>> >   If you look at:
>> >   http://owtaylor.files.wordpress.com/2012/11/tweaking-compositor-timing-busy-large.png
>> >
>> >   the arrow from the compositor to the application represents the time
>> >   when the application can meaningfully start the next frame. If the
>> >   application start drawing the next frame before this, then you won't be
>> >   throttled to the compositors drawing, so you may be drawing multiple
>> >   frames per one compositor frame, you may also be competing with the
>> >   compositor for GPU resources.
>> >
>> >   This arrow is probably the best analog of "swap complete" in the
>> >   composited case - and being notified of this is certainly something that
>> >   a toolkit (like Clutter) written on top of Cogl needs to know about. But
>> >   the time that "presentation" occurs is later - and the compositor needs
>> >   to send the application a separate message (not shown in the diagram)
>> >   when that happens.
>>
>> To me the existing semantics for SwapComplete entail the fact that the
>> buffer has hit the screen and is visible to the user so if we were to
>> keep the "swap complete" nomanclature it seems like it should be for
>> the second arrow.
>
> The terminology might seem best that way, but in terms of expected
> application behavior and hence compatibility, the first arrow is what
> corresponds to the current situation - if you wait for the image to
> actually be on screen before you draw the next frame, you'll likely
> be running at half the frame rate. SwapComplete as hooked up to
> intel_swap_event is about throttling.

agreed our use case for the SwapComplete events has been throttling
though at the cogl level the intention was to just directly pass on
the events from X and the semantics of those are just to relay when a
frame has been completed.

I had a feeling that I'd made Clutter build on top of that so, for
example, it wouldn't throttle itself by the SwapComplete events in the
case where there are currently no events pending (since in that case
it implies that there is a back-buffer free and clutter would be free
to go ahead an render another frame) but from a quick check it doesn't
look like clutter does that.

since I think it looks like we're going to deprecate this interface
anyway it doesn't make much odds which way we manage the
compatability. given how clutter basically just blindly throttles to
the SwapComplete events it would probably work out slightly better for
clutter to forward _FRAME_SYNC events as swap buffer callbacks, though
I do think that changes the intended semantics of the cogl interface.

My initial concern was that Clutter did have a smarter interpretation
of the semantics of the swap buffer callbacks, and didn't always wait
for a callback before drawing a new frame and so in that case if we
changed the cogl semantics we'd confuse clutter.

>
>> Something that your diagram doesn't capture and so I wonder if you've
>> considered it is the possibility that the compositor could choose to
>> withhold it's end-of-frame notification until after the presentation
>> notification. One reason I think this could be done is that the
>> compositor is throttling specific applications (that don't have focus
>> for example) as a way to ensure the GPU isn't overloaded and to
>> maintain the interactivity of the compositor itself and of the client
>> with focus. This just means that our api/implementation shouldn't be
>> assuming that each frame progresses until the point of presentation
>> which is the end of the line.
>
> This is not allowed in the proposed window manager specification -
>
>  _NET_WM_FRAME_TIMINGS
>
>  This message provides information about the timing of a previous
>  frame;  it is sent subsequent to the _NET_WM_FRAME_DRAWN message for
>  the frame once the window manager has obtained all available timing
>  information.
>
> this doesn't mean that the window manage can't throttle, it just means
> that if it throttles it also throttles the _NET_WM_FRAME_TIMINGS
> message. That's how I was thinking of it for Cogl as well - not a
> message at unthrottled time and message at presentation time, but
> a message at unthrottled time, and a message when Cogl has finished
> gathering timing information for the frame.

Yeah, it's been nagging me that by having the more explicit _PRESENTED
event that means that a client wanting to wait as long as possible
before collecting stats would need to manually keep track of what
per-frame events are still outstanding.  Although I think it's useful
to distinguish the _SYNC event so that apps are notified asap after
the compositor has unthrottled them, I don't see the same being true
of presentation events and having a mesa "_COMPLETED" event instead
could be more convenient.

>
>> >   This idea - that the frame proceeds through several stages before it is
>> >   presented and we there is a "presentation time" - drives several aspects of my
>> >   API design - the idea that there is a separate notification when the
>> >   frame data is complete, and the idea that you can get frame data before
>> >   it is complete.
>>
>> Neil's suggestion of us having one mechanism to handle the
>> notifications of FrameInfo progression sounds like it could be a good
>> way to go here and for the cogl-1.14 branch the old api could be
>> layered on top of this for compatibility.
>>
>> We can add a cogl_onscreen_add_frame_callback() function which takes a
>> callback like:
>>
>> void (* CoglFrameCallback) (CoglOnscreen *onscreen, CoglFrameEvent
>> event, CoglFrameInfo *info, void *user_data);
>>
>> And define COGL_FRAME_EVENT_SYNC and COGL_FRAME_EVENT_PRESENTED as
>> initial events corresponding to the stages discussed above.
>
> Sounds OK, though we have to be clear if we want "PRESENTED" or
> "COMPLETE" - I think "COMPLETE" is more general for the future - for
> adding new types of frame statistics.

right, lets go with having the "COMPLETE" event which should be more
convenient for apps too.

>
>> > * Even though what I need right now for Mutter is reasonably minimal -
>> >   the reason I'm making an attempt to push back and argue for something
>> >   that is close to the GTK+ API is that Clutter will eventually want
>> >   to have the full set of capabilities that GTK+ has, such as running
>> >   under a compositor and accurately reporting latency for Audio/Video
>> >   synchronization.
>> >
>> >   And there's very little difference between Clutter and GTK+ 3.8 in
>> >   being frame-driven - they work the same way - so the same API should
>> >   work for both.
>> >
>> >   I think it's considerably better if we can just export the Cogl
>> >   facilities for frame timing reporting rather than creating a new
>> >   set of API's in Clutter.
>> >
>> > * Presentation times that are uncorrelated with the system time are not
>> >   particularly useful - they perhaps could be used to detect frame
>> >   drops after the fact, but that's the only thing I can think of.
>> >   Presentation times that can be correlated with the system time, on
>> >   the other hand, allow for A/V synchronization among other things.
>>
>> I'm not sure if correlation to you implies a correlated scale and
>> correlated absolute position but I think that a correlated scale is
>> the main requirement to be useful to applications.
>>
>> for animation purposes the absolute system time often doesn't matter,
>> what matters I think is that you have good enough resolution, that you
>> know the timeline units and you know whether it is monotonic or not.
>> animations can usually be progressed relative to a base/start
>> timestamp and so calculations are only relative and it doesn't matter
>> what timeline you use. it's important that the application/toolkit be
>> designed to consistently use the same timeline for driving animations
>> but for clutter for example which progresses its animations in a
>> single step as part of rendering a frame that's quite straightforward
>> to guarantee.
>
> For me, it's definitely essential to have a correlated scale *and* a
> correlated absolute position. My main interest is Audio-Video
> synchronization, and for that, the absolute position is needed.

I can see that you want to correlate an absolute position in this
case, but I think it's helpful to clarify that it's the absolute
position with respect to your media layer's time source. I think this
discussion would be clearer if we refrained from using the term
"system time" with the suggestion that, that implies some particular
canonical time source. A system can support numerous time sources.

>
>> the main difficulty I see with passing on UST values from opengl as
>> presentation times is that opengl doesn't even guarantee what units
>> the timestamps have (I'm guessing to allow raw rdtsc counters to be
>> used) and I would much rather be able to pass on timestamps with a
>> guaranteed timescale. I'd also like to guarantee that the timestamps
>> are monotonic, which UST values are meant to be except for the fact
>> that until recently drm based drivers reported gettimeofday
>> timestamps.
>
> The fact that the OML spec doesn't even define a scale makes me wonder
> if the authors of the specification expected that application authors
> would use out-of-band knowledge when using the UST...  There are some
> things that you can do without correlated absolute position (like
> measure jitter in latency as a quality measure), but I can't think of
> anything you can do without the scale.

I think the authors had a Windows background. On Windows they have a
QueryPerformanceCounter function which I think is typically just a
thin wrapper around the rdts instruction. They also have a
QueryPerformanceFrequency api to be able to map rdts values into a
known scale. I would strongly expect the WGL sync_control extension to
report QueryPerformanceCounter values, and the GLX sync_control spec
is documented as being based on the WGL spec.

I suppose technically GLX applications are required use a calibration
with a known delay to empirically determine the frequency or use
out-of-band knowledge to determine the frequency.

I'm not sure if the authors expected applications to use some
out-of-band knowledge or if they just expected apps to do their own
manual calibration but either way it doesn't seem like an ideal way to
define UST for unix.

>
>> > * When I say that I want timestamps in the timescale of
>> >   g_get_monotonic_time(), it's not that I'm particularly concerned about
>> >   monotonicity - the important aspect is that the timestamps
>> >   can be correlated with system time. I think as long as we're doing
>> >   about as good a job as possible at converting presentation timestamps
>> >   to a useful timescale, that's good enough, and there is little value
>> >   in the raw timestamps beyond that.
>>
>> I still have some doubts about this approach of promising a mapping of
>> all driver timestamps to the g_get_monotonic_time() timeline. I think
>> maybe a partial mapping to only guarantee scale/units could suffice.
>> These are some of the reasons I have doubts:
>>
>> - g_get_monotonic_time() has inconsistent semantics across platforms
>> (uses non-monotonic gettimeofday() on osx and on windows has a very
>> low resolution of around 10-16ms) so it generally doesn't seem like an
>> ideal choice as a canonical timeline.
>
> Being cross platform is hard - there are all sorts of constraints fixed
> on us by trying to find things that work on different platforms.
> Accepting constraints that go beyond this - like considering the current
> implementation of g_get_monotonic_time() as immutable really brings us
> to the point of impossibility. As I said in an earlier email, both the
> Mac and Windows have timescales that they report graphics timings that
> would be more suitable for g_get_monotonic_time() than the current
> implementation.

g_get_monotonic_time() isn't immutable, but I also don't know of any
plans to change it. Cogl and Clutter are used on windows and for
example it would be pretty easy to add support for WGL_sync_control to
Cogl but if we have to map into the g_get_monotonic_time() timeline
then that would seem pretty pointless due to the very low resolution.
That then blocks a relatively simple change to Cogl on updating glib.
It may turn out to be straight forward to update glib to use
QueryPerformanceCounter or maybe there are legitimate reasons to not
do that (if there are concerns about QueryPerformanceCounter support
on some systems) I'm not sure.

If I were more convinced that mapping to the g_get_monotonic_time()
timeline was extremely valuable then that would potentially outweigh
this purely hypothetical concern, but I think that for many uses of
presentation timestamps a guaranteed scale is sufficient and I think
if we look at the details of more specific problems like a/v
synchronization we can find a better approach there too.

>
>> - my reading of the GLX and WGL specs leads be to believe that we
>> don't have a way to randomly access UST values; the UST values we can
>> query are meant to correspond to the start of the most recent vblank
>> period. This seems to conflict with your approach to mapping which
>> relies on being able to use a correlation of "now" to offset/map a
>> given ust value.
>> - Even if glXGetSyncValues does let us randomly access the UST values
>> then we can introduce pretty large errors during correlation cause by
>> round tripping to the xserver
>> - Also related to this; EGL doesn't yet have much precedent with
>> regards to exposing UST timestamps. If for example a standalone
>> extension were written to expose SwapComplete timestamps which might
>> have no reason to also define an api for random access of UST values
>> then we wouldn't be able to correlate with g_get_monotonic_time as we
>> do with glx.
>
> I reread the specs and read the implementation and I would agree that
> you are right that with GLX we don't have an easy query
>
>> - the potential for error may be even worse whenever the
>> g_get_monotonic_time timescale is used as a third intermediary to
>> correlate graphics timestamps with another sub-systems timestamps
>
> It seems unrealistic to me that we'd export unidentified arbitrary
> timestamps and then applications would figure out how to correlate them
> with some other system. You've demonstrated that it's hard even without
> an abstraction layer like Cogl in the middle.
>
>> - I can see that most application animations and display
>> synchronization can be handled without needing system time
>> correlation, they only need guaranteed units, so why not handle
>> specific issues such as a/v and input synchronization on a case by
>> case basis
>>
>> - having to rely on heuristics to figure out what time source the
>> driver is using on linux seems fragile (e.g. I'd imagine
>> CLOCK_MONOTONIC_RAW could be mistaken for CLOCK_MONOTONIC and then
>> later on if the clocks diverge that could lead to a large error in
>> mapping)
>>
>> - We can't make any assumptions about the scale of UST values. I
>> believe the GLX and WGL sync_control specs were designed so that
>> drivers could report rdtsc CPU counters for UST values and to map
>> these into the g_get_monotonic_time() timescale we would need to
>> empirically determine the frequency of the UST counter.
>
> I'm really not sure what kind of applications you are thinking about
> that don't need system time correlation. Many applications don't need
> presentation timestamps *at all*. For those that do, A/V synchronization
> is likely the most common case. There is certainly fragility and
> complexity in the mapping of UST values onto system time, but Cogl
> seems like the right place to bite the bullet and encapsulate that
> fragility.

I'm just thinking of typical applications that need to drive
tweening/ease-in/out like animations that should complete with a given
duration. Basically most applications doing something a bit fancy with
their UI would fall into this category. This kind of animation can
certainly benefit from tracking presentation times so as to predict
when frames will become visible to a user. For example Clutter's
current approach of using g_source_get_time() to drive animations
means that in the common case where _swap_buffers wont block for the
first swap - when there is a back buffer free to start the next frame
- then Clutter can end up drawing 2 frames in quick succession using
two timestamps that are very close together even though those frames
will likely be presented ~16ms apart in the end. Looking at
recent,historic presentation times would give one simple way of
predicting when a frame will become visible and thus how far to
progress animations.

A/V synchronization seems like a much more specialized problem in
comparison, so I wouldn't have considered it the common case, though
it's certainly an important use case. This is also where I find the
term "system time" most miss leading, and think it might be clearer to
be more explicit and refer to a "media time" or "a/v time" since
conceptually there is no implied relationship between a/v timestamps
and say g_get_monotonic_time(). You are faced with basically the same
problem of having to map between a/v time and g_get_monotonic_time()
as with mapping from ust to g_get_monotonic_time. Using gstreamer as
an example you have a GstClock which is just another monotonic clock
with an unknown base. Gstreamer is also designed so the GstClock
implementation can be replaced, but notably it does provide api to
query the current time which could be used for offset correlation.

For the problem of correlating A/V then the g_get_monotonic_time()
time line is a middle man. Assuming you are using gstreamer then I
expect what you want in the end is a mapping to a GstClock time line.
I wonder if adding a cogl_gst_frame_info_get_presentation_time() would
be more convenient to you? The function could take a GstClock pointer
so it can query a timestamp for doing the mapping.

>
>> Even with the recent change to the drm drivers the scale has changed
>> from microseconds to nanoseconds.
>
> Can you give me a code reference for that? I'm not finding that change
> in the DRM driver sources.

It looks like I jumped the gun here. I was thinking about commit
c61eef726a78ae77b6ce223d01ea2130f465fe5c which makes the drm drivers
query CLOCK_MONOTONIC time instead of gettimeofday for vblank events.
I was assuming that since gettimeofday reports a timeval in
microseconds and clock_gettime reports a timespec in nanoseconds that
now the vblank events would be reporting time stamps in nanoseconds.
On closer inspection though the drm interface uses a timeval to report
the time, not a uint64 like I'd imagined, so I think it's actually
reporting CLOCK_MONOTONIC times but in microseconds instead of
nanoseconds.

Something to consider though is that if the Nvidia driver were to
report CLOCK_MONOTONIC timestamps on Linux then it may report those in
nanoseconds so our heuristics in Cogl for detecting CLOCK_MONOTONIC
may need updating to consider both cases.

>
>> I would suggest that if we aren't sure what timesource the driver is
>> using then we should not attempt to do any kind of mapping.
>
> I'm fine with that - basically I'll do whatever I need to do to get that
> knowledge working on the platforms that I care about (which is basically
> Linux with open source or NVIDIA drivers), and where I don't care, I
> don't care.

It could be good to get some input from Nvidia about how they report
UST values from their driver.

>
>> > * If we start having other times involved, such as the frame
>> >   time, or perhaps in the future the predicted presentation time
>> >   (I ended up needing to add this in GTK+), then I think the idea of
>> >   parallel API's to either get a raw presentation timestamp or one
>> >   in the timescale of g_get_monotonic_time() would be quite clunky.
>> >
>> >   To avoid a build-time dependency on GLib, what makes sense to me is to
>> >   return timestamps in terms of g_get_monotonic_time() if built against
>> >   GLib and in some arbitrary timescale otherwise.
>>
>> With my current doubts and concerns about the idea of mapping to the
>> g_get_monotonic_time() timescale I think we should constrain ourselves
>> to only guarantee the scale of the presentation timestamps to being in
>> nanoseconds, and possibly monotonic. I say nanoseconds since this is
>> consistent with how EGL defines UST values in khrplatform.h and having
>> a high precision might be useful in the future for profiling if
>> drivers enable tracing the micro progression of a frame through the
>> GPU using the same timeline.
>>
>> If we do find a way to address those concerns then I think we can
>> consider adding a parallel api later with a _glib namespace but I
>> struggle to see how this mapping can avoid reducing the quality of the
>> timing information so even if Cogl is built with a glib dependency I'd
>> like to keep access to the more pristine (and possibly significantly
>> more accurate) data.
>
> I'm not so fine with the lack of absolute time correlation. It seems
> silly to me to have reverse-engineering code in *both* Mutter and COGL,
> which is what I'd have to do.
>
> Any chance we can make COGL (on Linux) always return a value based
> on CLOCK_MONOTONIC? We can worry about other platforms at some other
> time.

It would be good to hear what you think about having a
cogl_gst_frame_info_get_presentation_time() instead.

Although promising CLOCK_MONOTONIC clarifies the cross-platform issues
when compared to g_get_monotonic_time() if there are any linux drivers
that use CLOCK_MONOTONIC_RAW, for example, this could go quite badly
wrong. At least if we constrain the points where we do offset mapping
to times when we really need it (such as for a/v synchronization) then
we minimize the impact if the mapping can't be done accurately or if
it breaks monotonicity.

>
> In terms of nanoseconds vs. microseconds - don't care too much.
>
>>  Ok so assuming that the baseline is with my proposed patches sent to
>> the list so far applied on top of your original patches, I currently
>> think these are the next steps:
>>
>> - Rename from SwapInfo to FrameInfo - since you pointed out that
>> "frame" is more in line with gtk and "swap" isn't really meaningful if
>> we have use cases for getting information before swapping (I have a
>> patch for this I can send out)
>> - cogl_frame_info_get_refresh_interval should be added back - since
>> you pointed out some platforms may not let us associate an output with
>> the frame info (I have a patch for this)
>> - Rework the UST mapping to only map into nanoseconds and not attempt
>> any mapping if we haven't identified the time source. (I have a patch
>> for this)
>> - Rework cogl_onscreen_add_swap_complete_callback to be named
>> cogl_onscreen_add_frame_callback as discussed above and update the
>> compatibility shim for cogl_onscreen_add_swap_buffers_callback (I have
>> a patch for this)
>> - Write some good gtk-doc documentation for the cogl_output api since
>> it will almost certainly be made public (It would be good if you could
>> look at this if possible?)
>> - Review the cogl-output.c code, since your original patches didn't
>> include the implementation of CoglOutput on the header
>
> I'll go though your patches and review them and write docs for CoglOutut
> - I pushed a branch adding the missing

thanks.

It seems like the main issue we still have some disagreements about is
the g_get_monotonic_time() mapping, but I don't think this has to
block landing anything at this stage.

Adding cogl_gst_frame_info_get_presentation_time() or
cogl_glib_frame_info_get_presentation_time() functions are two
potential solutions. Before the 1.14 release there is also even the
possibility of committing to stricter time line guarantees which
wouldn't be incompatible with more conservative scale guarantees to
start with.

kind regards,
- Robert

>
> - Owen
>
>