[Cogl] [PATCH 3/3] Add CoglFrameTimings

Mon Jan 21 14:06:20 PST 2013

On Fri, 2013-01-11 at 16:36 +0000, Robert Bragg wrote:
> Ok, lets try and pick this up again now that we're all back from holiday...

[...]

> >   If you look at:
> >   http://owtaylor.files.wordpress.com/2012/11/tweaking-compositor-timing-busy-large.png
> >
> >   the arrow from the compositor to the application represents the time
> >   when the application can meaningfully start the next frame. If the
> >   application start drawing the next frame before this, then you won't be
> >   throttled to the compositors drawing, so you may be drawing multiple
> >   frames per one compositor frame, you may also be competing with the
> >   compositor for GPU resources.
> >
> >   This arrow is probably the best analog of "swap complete" in the
> >   composited case - and being notified of this is certainly something that
> >   a toolkit (like Clutter) written on top of Cogl needs to know about. But
> >   the time that "presentation" occurs is later - and the compositor needs
> >   to send the application a separate message (not shown in the diagram)
> >   when that happens.
> 
> To me the existing semantics for SwapComplete entail the fact that the
> buffer has hit the screen and is visible to the user so if we were to
> keep the "swap complete" nomanclature it seems like it should be for
> the second arrow.

The terminology might seem best that way, but in terms of expected
application behavior and hence compatibility, the first arrow is what
corresponds to the current situation - if you wait for the image to
actually be on screen before you draw the next frame, you'll likely
be running at half the frame rate. SwapComplete as hooked up to
intel_swap_event is about throttling.

> Something that your diagram doesn't capture and so I wonder if you've
> considered it is the possibility that the compositor could choose to
> withhold it's end-of-frame notification until after the presentation
> notification. One reason I think this could be done is that the
> compositor is throttling specific applications (that don't have focus
> for example) as a way to ensure the GPU isn't overloaded and to
> maintain the interactivity of the compositor itself and of the client
> with focus. This just means that our api/implementation shouldn't be
> assuming that each frame progresses until the point of presentation
> which is the end of the line.

This is not allowed in the proposed window manager specification -

 _NET_WM_FRAME_TIMINGS

 This message provides information about the timing of a previous 
 frame;  it is sent subsequent to the _NET_WM_FRAME_DRAWN message for
 the frame once the window manager has obtained all available timing
 information.

this doesn't mean that the window manage can't throttle, it just means
that if it throttles it also throttles the _NET_WM_FRAME_TIMINGS
message. That's how I was thinking of it for Cogl as well - not a
message at unthrottled time and message at presentation time, but 
a message at unthrottled time, and a message when Cogl has finished
gathering timing information for the frame.

> >   This idea - that the frame proceeds through several stages before it is
> >   presented and we there is a "presentation time" - drives several aspects of my
> >   API design - the idea that there is a separate notification when the
> >   frame data is complete, and the idea that you can get frame data before
> >   it is complete.
> 
> Neil's suggestion of us having one mechanism to handle the
> notifications of FrameInfo progression sounds like it could be a good
> way to go here and for the cogl-1.14 branch the old api could be
> layered on top of this for compatibility.
> 
> We can add a cogl_onscreen_add_frame_callback() function which takes a
> callback like:
> 
> void (* CoglFrameCallback) (CoglOnscreen *onscreen, CoglFrameEvent
> event, CoglFrameInfo *info, void *user_data);
> 
> And define COGL_FRAME_EVENT_SYNC and COGL_FRAME_EVENT_PRESENTED as
> initial events corresponding to the stages discussed above.

Sounds OK, though we have to be clear if we want "PRESENTED" or
"COMPLETE" - I think "COMPLETE" is more general for the future - for
adding new types of frame statistics.

> > * Even though what I need right now for Mutter is reasonably minimal -
> >   the reason I'm making an attempt to push back and argue for something
> >   that is close to the GTK+ API is that Clutter will eventually want
> >   to have the full set of capabilities that GTK+ has, such as running
> >   under a compositor and accurately reporting latency for Audio/Video
> >   synchronization.
> >
> >   And there's very little difference between Clutter and GTK+ 3.8 in
> >   being frame-driven - they work the same way - so the same API should
> >   work for both.
> >
> >   I think it's considerably better if we can just export the Cogl
> >   facilities for frame timing reporting rather than creating a new
> >   set of API's in Clutter.
> >
> > * Presentation times that are uncorrelated with the system time are not
> >   particularly useful - they perhaps could be used to detect frame
> >   drops after the fact, but that's the only thing I can think of.
> >   Presentation times that can be correlated with the system time, on
> >   the other hand, allow for A/V synchronization among other things.
> 
> I'm not sure if correlation to you implies a correlated scale and
> correlated absolute position but I think that a correlated scale is
> the main requirement to be useful to applications.
> 
> for animation purposes the absolute system time often doesn't matter,
> what matters I think is that you have good enough resolution, that you
> know the timeline units and you know whether it is monotonic or not.
> animations can usually be progressed relative to a base/start
> timestamp and so calculations are only relative and it doesn't matter
> what timeline you use. it's important that the application/toolkit be
> designed to consistently use the same timeline for driving animations
> but for clutter for example which progresses its animations in a
> single step as part of rendering a frame that's quite straightforward
> to guarantee.

For me, it's definitely essential to have a correlated scale *and* a
correlated absolute position. My main interest is Audio-Video
synchronization, and for that, the absolute position is needed.

> the main difficulty I see with passing on UST values from opengl as
> presentation times is that opengl doesn't even guarantee what units
> the timestamps have (I'm guessing to allow raw rdtsc counters to be
> used) and I would much rather be able to pass on timestamps with a
> guaranteed timescale. I'd also like to guarantee that the timestamps
> are monotonic, which UST values are meant to be except for the fact
> that until recently drm based drivers reported gettimeofday
> timestamps.

The fact that the OML spec doesn't even define a scale makes me wonder
if the authors of the specification expected that application authors
would use out-of-band knowledge when using the UST...  There are some
things that you can do without correlated absolute position (like
measure jitter in latency as a quality measure), but I can't think of
anything you can do without the scale.

> > * When I say that I want timestamps in the timescale of
> >   g_get_monotonic_time(), it's not that I'm particularly concerned about
> >   monotonicity - the important aspect is that the timestamps
> >   can be correlated with system time. I think as long as we're doing
> >   about as good a job as possible at converting presentation timestamps
> >   to a useful timescale, that's good enough, and there is little value
> >   in the raw timestamps beyond that.
> 
> I still have some doubts about this approach of promising a mapping of
> all driver timestamps to the g_get_monotonic_time() timeline. I think
> maybe a partial mapping to only guarantee scale/units could suffice.
> These are some of the reasons I have doubts:
> 
> - g_get_monotonic_time() has inconsistent semantics across platforms
> (uses non-monotonic gettimeofday() on osx and on windows has a very
> low resolution of around 10-16ms) so it generally doesn't seem like an
> ideal choice as a canonical timeline.

Being cross platform is hard - there are all sorts of constraints fixed
on us by trying to find things that work on different platforms.
Accepting constraints that go beyond this - like considering the current
implementation of g_get_monotonic_time() as immutable really brings us
to the point of impossibility. As I said in an earlier email, both the
Mac and Windows have timescales that they report graphics timings that
would be more suitable for g_get_monotonic_time() than the current
implementation.

> - my reading of the GLX and WGL specs leads be to believe that we
> don't have a way to randomly access UST values; the UST values we can
> query are meant to correspond to the start of the most recent vblank
> period. This seems to conflict with your approach to mapping which
> relies on being able to use a correlation of "now" to offset/map a
> given ust value.
> - Even if glXGetSyncValues does let us randomly access the UST values
> then we can introduce pretty large errors during correlation cause by
> round tripping to the xserver
> - Also related to this; EGL doesn't yet have much precedent with
> regards to exposing UST timestamps. If for example a standalone
> extension were written to expose SwapComplete timestamps which might
> have no reason to also define an api for random access of UST values
> then we wouldn't be able to correlate with g_get_monotonic_time as we
> do with glx.

I reread the specs and read the implementation and I would agree that
you are right that with GLX we don't have an easy query 

> - the potential for error may be even worse whenever the
> g_get_monotonic_time timescale is used as a third intermediary to
> correlate graphics timestamps with another sub-systems timestamps

It seems unrealistic to me that we'd export unidentified arbitrary
timestamps and then applications would figure out how to correlate them
with some other system. You've demonstrated that it's hard even without
an abstraction layer like Cogl in the middle.

> - I can see that most application animations and display
> synchronization can be handled without needing system time
> correlation, they only need guaranteed units, so why not handle
> specific issues such as a/v and input synchronization on a case by
> case basis
>
> - having to rely on heuristics to figure out what time source the
> driver is using on linux seems fragile (e.g. I'd imagine
> CLOCK_MONOTONIC_RAW could be mistaken for CLOCK_MONOTONIC and then
> later on if the clocks diverge that could lead to a large error in
> mapping)
>
> - We can't make any assumptions about the scale of UST values. I
> believe the GLX and WGL sync_control specs were designed so that
> drivers could report rdtsc CPU counters for UST values and to map
> these into the g_get_monotonic_time() timescale we would need to
> empirically determine the frequency of the UST counter. 

I'm really not sure what kind of applications you are thinking about
that don't need system time correlation. Many applications don't need
presentation timestamps *at all*. For those that do, A/V synchronization
is likely the most common case. There is certainly fragility and
complexity in the mapping of UST values onto system time, but Cogl
seems like the right place to bite the bullet and encapsulate that
fragility.

> Even with the recent change to the drm drivers the scale has changed 
> from microseconds to nanoseconds.

Can you give me a code reference for that? I'm not finding that change
in the DRM driver sources.

> I would suggest that if we aren't sure what timesource the driver is
> using then we should not attempt to do any kind of mapping.

I'm fine with that - basically I'll do whatever I need to do to get that
knowledge working on the platforms that I care about (which is basically
Linux with open source or NVIDIA drivers), and where I don't care, I
don't care.

> > * If we start having other times involved, such as the frame
> >   time, or perhaps in the future the predicted presentation time
> >   (I ended up needing to add this in GTK+), then I think the idea of
> >   parallel API's to either get a raw presentation timestamp or one
> >   in the timescale of g_get_monotonic_time() would be quite clunky.
> >
> >   To avoid a build-time dependency on GLib, what makes sense to me is to
> >   return timestamps in terms of g_get_monotonic_time() if built against
> >   GLib and in some arbitrary timescale otherwise.
> 
> With my current doubts and concerns about the idea of mapping to the
> g_get_monotonic_time() timescale I think we should constrain ourselves
> to only guarantee the scale of the presentation timestamps to being in
> nanoseconds, and possibly monotonic. I say nanoseconds since this is
> consistent with how EGL defines UST values in khrplatform.h and having
> a high precision might be useful in the future for profiling if
> drivers enable tracing the micro progression of a frame through the
> GPU using the same timeline.
>
> If we do find a way to address those concerns then I think we can
> consider adding a parallel api later with a _glib namespace but I
> struggle to see how this mapping can avoid reducing the quality of the
> timing information so even if Cogl is built with a glib dependency I'd
> like to keep access to the more pristine (and possibly significantly
> more accurate) data.

I'm not so fine with the lack of absolute time correlation. It seems
silly to me to have reverse-engineering code in *both* Mutter and COGL,
which is what I'd have to do.

Any chance we can make COGL (on Linux) always return a value based
on CLOCK_MONOTONIC? We can worry about other platforms at some other
time.

In terms of nanoseconds vs. microseconds - don't care too much. 

>  Ok so assuming that the baseline is with my proposed patches sent to
> the list so far applied on top of your original patches, I currently
> think these are the next steps:
> 
> - Rename from SwapInfo to FrameInfo - since you pointed out that
> "frame" is more in line with gtk and "swap" isn't really meaningful if
> we have use cases for getting information before swapping (I have a
> patch for this I can send out)
> - cogl_frame_info_get_refresh_interval should be added back - since
> you pointed out some platforms may not let us associate an output with
> the frame info (I have a patch for this)
> - Rework the UST mapping to only map into nanoseconds and not attempt
> any mapping if we haven't identified the time source. (I have a patch
> for this)
> - Rework cogl_onscreen_add_swap_complete_callback to be named
> cogl_onscreen_add_frame_callback as discussed above and update the
> compatibility shim for cogl_onscreen_add_swap_buffers_callback (I have
> a patch for this)
> - Write some good gtk-doc documentation for the cogl_output api since
> it will almost certainly be made public (It would be good if you could
> look at this if possible?)
> - Review the cogl-output.c code, since your original patches didn't
> include the implementation of CoglOutput on the header

I'll go though your patches and review them and write docs for CoglOutut
- I pushed a branch adding the missing 

- Owen