[Cogl] [PATCH 3/3] Add CoglFrameTimings

Thu Jan 24 12:30:06 PST 2013

For reference I have pushed a rebased wip/rib/frame-synchronization
branch (based on the cogl-1.14 branch) which includes my latest patch
that replaces the _PRESENTED event with the _COMPLETE event as
discussed.

There is also a tentative patch on the branch to make the CoglOutput
api public by exposing a cogl_renderer_foreach_output() function and
an update to cogl-info to dump the data.

kind regards,
- Robert

On Thu, Jan 24, 2013 at 7:59 PM, Robert Bragg <robert at sixbynine.org> wrote:
> On Mon, Jan 21, 2013 at 10:06 PM, Owen Taylor <otaylor at redhat.com> wrote:
>> On Fri, 2013-01-11 at 16:36 +0000, Robert Bragg wrote:
>>> Ok, lets try and pick this up again now that we're all back from holiday...
>>
>> [...]
>>
>>> >   If you look at:
>>> >   http://owtaylor.files.wordpress.com/2012/11/tweaking-compositor-timing-busy-large.png
>>> >
>>> >   the arrow from the compositor to the application represents the time
>>> >   when the application can meaningfully start the next frame. If the
>>> >   application start drawing the next frame before this, then you won't be
>>> >   throttled to the compositors drawing, so you may be drawing multiple
>>> >   frames per one compositor frame, you may also be competing with the
>>> >   compositor for GPU resources.
>>> >
>>> >   This arrow is probably the best analog of "swap complete" in the
>>> >   composited case - and being notified of this is certainly something that
>>> >   a toolkit (like Clutter) written on top of Cogl needs to know about. But
>>> >   the time that "presentation" occurs is later - and the compositor needs
>>> >   to send the application a separate message (not shown in the diagram)
>>> >   when that happens.
>>>
>>> To me the existing semantics for SwapComplete entail the fact that the
>>> buffer has hit the screen and is visible to the user so if we were to
>>> keep the "swap complete" nomanclature it seems like it should be for
>>> the second arrow.
>>
>> The terminology might seem best that way, but in terms of expected
>> application behavior and hence compatibility, the first arrow is what
>> corresponds to the current situation - if you wait for the image to
>> actually be on screen before you draw the next frame, you'll likely
>> be running at half the frame rate. SwapComplete as hooked up to
>> intel_swap_event is about throttling.
>
> agreed our use case for the SwapComplete events has been throttling
> though at the cogl level the intention was to just directly pass on
> the events from X and the semantics of those are just to relay when a
> frame has been completed.
>
> I had a feeling that I'd made Clutter build on top of that so, for
> example, it wouldn't throttle itself by the SwapComplete events in the
> case where there are currently no events pending (since in that case
> it implies that there is a back-buffer free and clutter would be free
> to go ahead an render another frame) but from a quick check it doesn't
> look like clutter does that.
>
> since I think it looks like we're going to deprecate this interface
> anyway it doesn't make much odds which way we manage the
> compatability. given how clutter basically just blindly throttles to
> the SwapComplete events it would probably work out slightly better for
> clutter to forward _FRAME_SYNC events as swap buffer callbacks, though
> I do think that changes the intended semantics of the cogl interface.
>
> My initial concern was that Clutter did have a smarter interpretation
> of the semantics of the swap buffer callbacks, and didn't always wait
> for a callback before drawing a new frame and so in that case if we
> changed the cogl semantics we'd confuse clutter.
>
>>
>>> Something that your diagram doesn't capture and so I wonder if you've
>>> considered it is the possibility that the compositor could choose to
>>> withhold it's end-of-frame notification until after the presentation
>>> notification. One reason I think this could be done is that the
>>> compositor is throttling specific applications (that don't have focus
>>> for example) as a way to ensure the GPU isn't overloaded and to
>>> maintain the interactivity of the compositor itself and of the client
>>> with focus. This just means that our api/implementation shouldn't be
>>> assuming that each frame progresses until the point of presentation
>>> which is the end of the line.
>>
>> This is not allowed in the proposed window manager specification -
>>
>>  _NET_WM_FRAME_TIMINGS
>>
>>  This message provides information about the timing of a previous
>>  frame;  it is sent subsequent to the _NET_WM_FRAME_DRAWN message for
>>  the frame once the window manager has obtained all available timing
>>  information.
>>
>> this doesn't mean that the window manage can't throttle, it just means
>> that if it throttles it also throttles the _NET_WM_FRAME_TIMINGS
>> message. That's how I was thinking of it for Cogl as well - not a
>> message at unthrottled time and message at presentation time, but
>> a message at unthrottled time, and a message when Cogl has finished
>> gathering timing information for the frame.
>
> Yeah, it's been nagging me that by having the more explicit _PRESENTED
> event that means that a client wanting to wait as long as possible
> before collecting stats would need to manually keep track of what
> per-frame events are still outstanding.  Although I think it's useful
> to distinguish the _SYNC event so that apps are notified asap after
> the compositor has unthrottled them, I don't see the same being true
> of presentation events and having a mesa "_COMPLETED" event instead
> could be more convenient.
>
>>
>>> >   This idea - that the frame proceeds through several stages before it is
>>> >   presented and we there is a "presentation time" - drives several aspects of my
>>> >   API design - the idea that there is a separate notification when the
>>> >   frame data is complete, and the idea that you can get frame data before
>>> >   it is complete.
>>>
>>> Neil's suggestion of us having one mechanism to handle the
>>> notifications of FrameInfo progression sounds like it could be a good
>>> way to go here and for the cogl-1.14 branch the old api could be
>>> layered on top of this for compatibility.
>>>
>>> We can add a cogl_onscreen_add_frame_callback() function which takes a
>>> callback like:
>>>
>>> void (* CoglFrameCallback) (CoglOnscreen *onscreen, CoglFrameEvent
>>> event, CoglFrameInfo *info, void *user_data);
>>>
>>> And define COGL_FRAME_EVENT_SYNC and COGL_FRAME_EVENT_PRESENTED as
>>> initial events corresponding to the stages discussed above.
>>
>> Sounds OK, though we have to be clear if we want "PRESENTED" or
>> "COMPLETE" - I think "COMPLETE" is more general for the future - for
>> adding new types of frame statistics.
>
> right, lets go with having the "COMPLETE" event which should be more
> convenient for apps too.
>
>>
>>> > * Even though what I need right now for Mutter is reasonably minimal -
>>> >   the reason I'm making an attempt to push back and argue for something
>>> >   that is close to the GTK+ API is that Clutter will eventually want
>>> >   to have the full set of capabilities that GTK+ has, such as running
>>> >   under a compositor and accurately reporting latency for Audio/Video
>>> >   synchronization.
>>> >
>>> >   And there's very little difference between Clutter and GTK+ 3.8 in
>>> >   being frame-driven - they work the same way - so the same API should
>>> >   work for both.
>>> >
>>> >   I think it's considerably better if we can just export the Cogl
>>> >   facilities for frame timing reporting rather than creating a new
>>> >   set of API's in Clutter.
>>> >
>>> > * Presentation times that are uncorrelated with the system time are not
>>> >   particularly useful - they perhaps could be used to detect frame
>>> >   drops after the fact, but that's the only thing I can think of.
>>> >   Presentation times that can be correlated with the system time, on
>>> >   the other hand, allow for A/V synchronization among other things.
>>>
>>> I'm not sure if correlation to you implies a correlated scale and
>>> correlated absolute position but I think that a correlated scale is
>>> the main requirement to be useful to applications.
>>>
>>> for animation purposes the absolute system time often doesn't matter,
>>> what matters I think is that you have good enough resolution, that you
>>> know the timeline units and you know whether it is monotonic or not.
>>> animations can usually be progressed relative to a base/start
>>> timestamp and so calculations are only relative and it doesn't matter
>>> what timeline you use. it's important that the application/toolkit be
>>> designed to consistently use the same timeline for driving animations
>>> but for clutter for example which progresses its animations in a
>>> single step as part of rendering a frame that's quite straightforward
>>> to guarantee.
>>
>> For me, it's definitely essential to have a correlated scale *and* a
>> correlated absolute position. My main interest is Audio-Video
>> synchronization, and for that, the absolute position is needed.
>
> I can see that you want to correlate an absolute position in this
> case, but I think it's helpful to clarify that it's the absolute
> position with respect to your media layer's time source. I think this
> discussion would be clearer if we refrained from using the term
> "system time" with the suggestion that, that implies some particular
> canonical time source. A system can support numerous time sources.
>
>>
>>> the main difficulty I see with passing on UST values from opengl as
>>> presentation times is that opengl doesn't even guarantee what units
>>> the timestamps have (I'm guessing to allow raw rdtsc counters to be
>>> used) and I would much rather be able to pass on timestamps with a
>>> guaranteed timescale. I'd also like to guarantee that the timestamps
>>> are monotonic, which UST values are meant to be except for the fact
>>> that until recently drm based drivers reported gettimeofday
>>> timestamps.
>>
>> The fact that the OML spec doesn't even define a scale makes me wonder
>> if the authors of the specification expected that application authors
>> would use out-of-band knowledge when using the UST...  There are some
>> things that you can do without correlated absolute position (like
>> measure jitter in latency as a quality measure), but I can't think of
>> anything you can do without the scale.
>
> I think the authors had a Windows background. On Windows they have a
> QueryPerformanceCounter function which I think is typically just a
> thin wrapper around the rdts instruction. They also have a
> QueryPerformanceFrequency api to be able to map rdts values into a
> known scale. I would strongly expect the WGL sync_control extension to
> report QueryPerformanceCounter values, and the GLX sync_control spec
> is documented as being based on the WGL spec.
>
> I suppose technically GLX applications are required use a calibration
> with a known delay to empirically determine the frequency or use
> out-of-band knowledge to determine the frequency.
>
> I'm not sure if the authors expected applications to use some
> out-of-band knowledge or if they just expected apps to do their own
> manual calibration but either way it doesn't seem like an ideal way to
> define UST for unix.
>
>>
>>> > * When I say that I want timestamps in the timescale of
>>> >   g_get_monotonic_time(), it's not that I'm particularly concerned about
>>> >   monotonicity - the important aspect is that the timestamps
>>> >   can be correlated with system time. I think as long as we're doing
>>> >   about as good a job as possible at converting presentation timestamps
>>> >   to a useful timescale, that's good enough, and there is little value
>>> >   in the raw timestamps beyond that.
>>>
>>> I still have some doubts about this approach of promising a mapping of
>>> all driver timestamps to the g_get_monotonic_time() timeline. I think
>>> maybe a partial mapping to only guarantee scale/units could suffice.
>>> These are some of the reasons I have doubts:
>>>
>>> - g_get_monotonic_time() has inconsistent semantics across platforms
>>> (uses non-monotonic gettimeofday() on osx and on windows has a very
>>> low resolution of around 10-16ms) so it generally doesn't seem like an
>>> ideal choice as a canonical timeline.
>>
>> Being cross platform is hard - there are all sorts of constraints fixed
>> on us by trying to find things that work on different platforms.
>> Accepting constraints that go beyond this - like considering the current
>> implementation of g_get_monotonic_time() as immutable really brings us
>> to the point of impossibility. As I said in an earlier email, both the
>> Mac and Windows have timescales that they report graphics timings that
>> would be more suitable for g_get_monotonic_time() than the current
>> implementation.
>
> g_get_monotonic_time() isn't immutable, but I also don't know of any
> plans to change it. Cogl and Clutter are used on windows and for
> example it would be pretty easy to add support for WGL_sync_control to
> Cogl but if we have to map into the g_get_monotonic_time() timeline
> then that would seem pretty pointless due to the very low resolution.
> That then blocks a relatively simple change to Cogl on updating glib.
> It may turn out to be straight forward to update glib to use
> QueryPerformanceCounter or maybe there are legitimate reasons to not
> do that (if there are concerns about QueryPerformanceCounter support
> on some systems) I'm not sure.
>
> If I were more convinced that mapping to the g_get_monotonic_time()
> timeline was extremely valuable then that would potentially outweigh
> this purely hypothetical concern, but I think that for many uses of
> presentation timestamps a guaranteed scale is sufficient and I think
> if we look at the details of more specific problems like a/v
> synchronization we can find a better approach there too.
>
>>
>>> - my reading of the GLX and WGL specs leads be to believe that we
>>> don't have a way to randomly access UST values; the UST values we can
>>> query are meant to correspond to the start of the most recent vblank
>>> period. This seems to conflict with your approach to mapping which
>>> relies on being able to use a correlation of "now" to offset/map a
>>> given ust value.
>>> - Even if glXGetSyncValues does let us randomly access the UST values
>>> then we can introduce pretty large errors during correlation cause by
>>> round tripping to the xserver
>>> - Also related to this; EGL doesn't yet have much precedent with
>>> regards to exposing UST timestamps. If for example a standalone
>>> extension were written to expose SwapComplete timestamps which might
>>> have no reason to also define an api for random access of UST values
>>> then we wouldn't be able to correlate with g_get_monotonic_time as we
>>> do with glx.
>>
>> I reread the specs and read the implementation and I would agree that
>> you are right that with GLX we don't have an easy query
>>
>>> - the potential for error may be even worse whenever the
>>> g_get_monotonic_time timescale is used as a third intermediary to
>>> correlate graphics timestamps with another sub-systems timestamps
>>
>> It seems unrealistic to me that we'd export unidentified arbitrary
>> timestamps and then applications would figure out how to correlate them
>> with some other system. You've demonstrated that it's hard even without
>> an abstraction layer like Cogl in the middle.
>>
>>> - I can see that most application animations and display
>>> synchronization can be handled without needing system time
>>> correlation, they only need guaranteed units, so why not handle
>>> specific issues such as a/v and input synchronization on a case by
>>> case basis
>>>
>>> - having to rely on heuristics to figure out what time source the
>>> driver is using on linux seems fragile (e.g. I'd imagine
>>> CLOCK_MONOTONIC_RAW could be mistaken for CLOCK_MONOTONIC and then
>>> later on if the clocks diverge that could lead to a large error in
>>> mapping)
>>>
>>> - We can't make any assumptions about the scale of UST values. I
>>> believe the GLX and WGL sync_control specs were designed so that
>>> drivers could report rdtsc CPU counters for UST values and to map
>>> these into the g_get_monotonic_time() timescale we would need to
>>> empirically determine the frequency of the UST counter.
>>
>> I'm really not sure what kind of applications you are thinking about
>> that don't need system time correlation. Many applications don't need
>> presentation timestamps *at all*. For those that do, A/V synchronization
>> is likely the most common case. There is certainly fragility and
>> complexity in the mapping of UST values onto system time, but Cogl
>> seems like the right place to bite the bullet and encapsulate that
>> fragility.
>
> I'm just thinking of typical applications that need to drive
> tweening/ease-in/out like animations that should complete with a given
> duration. Basically most applications doing something a bit fancy with
> their UI would fall into this category. This kind of animation can
> certainly benefit from tracking presentation times so as to predict
> when frames will become visible to a user. For example Clutter's
> current approach of using g_source_get_time() to drive animations
> means that in the common case where _swap_buffers wont block for the
> first swap - when there is a back buffer free to start the next frame
> - then Clutter can end up drawing 2 frames in quick succession using
> two timestamps that are very close together even though those frames
> will likely be presented ~16ms apart in the end. Looking at
> recent,historic presentation times would give one simple way of
> predicting when a frame will become visible and thus how far to
> progress animations.
>
> A/V synchronization seems like a much more specialized problem in
> comparison, so I wouldn't have considered it the common case, though
> it's certainly an important use case. This is also where I find the
> term "system time" most miss leading, and think it might be clearer to
> be more explicit and refer to a "media time" or "a/v time" since
> conceptually there is no implied relationship between a/v timestamps
> and say g_get_monotonic_time(). You are faced with basically the same
> problem of having to map between a/v time and g_get_monotonic_time()
> as with mapping from ust to g_get_monotonic_time. Using gstreamer as
> an example you have a GstClock which is just another monotonic clock
> with an unknown base. Gstreamer is also designed so the GstClock
> implementation can be replaced, but notably it does provide api to
> query the current time which could be used for offset correlation.
>
> For the problem of correlating A/V then the g_get_monotonic_time()
> time line is a middle man. Assuming you are using gstreamer then I
> expect what you want in the end is a mapping to a GstClock time line.
> I wonder if adding a cogl_gst_frame_info_get_presentation_time() would
> be more convenient to you? The function could take a GstClock pointer
> so it can query a timestamp for doing the mapping.
>
>>
>>> Even with the recent change to the drm drivers the scale has changed
>>> from microseconds to nanoseconds.
>>
>> Can you give me a code reference for that? I'm not finding that change
>> in the DRM driver sources.
>
> It looks like I jumped the gun here. I was thinking about commit
> c61eef726a78ae77b6ce223d01ea2130f465fe5c which makes the drm drivers
> query CLOCK_MONOTONIC time instead of gettimeofday for vblank events.
> I was assuming that since gettimeofday reports a timeval in
> microseconds and clock_gettime reports a timespec in nanoseconds that
> now the vblank events would be reporting time stamps in nanoseconds.
> On closer inspection though the drm interface uses a timeval to report
> the time, not a uint64 like I'd imagined, so I think it's actually
> reporting CLOCK_MONOTONIC times but in microseconds instead of
> nanoseconds.
>
> Something to consider though is that if the Nvidia driver were to
> report CLOCK_MONOTONIC timestamps on Linux then it may report those in
> nanoseconds so our heuristics in Cogl for detecting CLOCK_MONOTONIC
> may need updating to consider both cases.
>
>>
>>> I would suggest that if we aren't sure what timesource the driver is
>>> using then we should not attempt to do any kind of mapping.
>>
>> I'm fine with that - basically I'll do whatever I need to do to get that
>> knowledge working on the platforms that I care about (which is basically
>> Linux with open source or NVIDIA drivers), and where I don't care, I
>> don't care.
>
> It could be good to get some input from Nvidia about how they report
> UST values from their driver.
>
>>
>>> > * If we start having other times involved, such as the frame
>>> >   time, or perhaps in the future the predicted presentation time
>>> >   (I ended up needing to add this in GTK+), then I think the idea of
>>> >   parallel API's to either get a raw presentation timestamp or one
>>> >   in the timescale of g_get_monotonic_time() would be quite clunky.
>>> >
>>> >   To avoid a build-time dependency on GLib, what makes sense to me is to
>>> >   return timestamps in terms of g_get_monotonic_time() if built against
>>> >   GLib and in some arbitrary timescale otherwise.
>>>
>>> With my current doubts and concerns about the idea of mapping to the
>>> g_get_monotonic_time() timescale I think we should constrain ourselves
>>> to only guarantee the scale of the presentation timestamps to being in
>>> nanoseconds, and possibly monotonic. I say nanoseconds since this is
>>> consistent with how EGL defines UST values in khrplatform.h and having
>>> a high precision might be useful in the future for profiling if
>>> drivers enable tracing the micro progression of a frame through the
>>> GPU using the same timeline.
>>>
>>> If we do find a way to address those concerns then I think we can
>>> consider adding a parallel api later with a _glib namespace but I
>>> struggle to see how this mapping can avoid reducing the quality of the
>>> timing information so even if Cogl is built with a glib dependency I'd
>>> like to keep access to the more pristine (and possibly significantly
>>> more accurate) data.
>>
>> I'm not so fine with the lack of absolute time correlation. It seems
>> silly to me to have reverse-engineering code in *both* Mutter and COGL,
>> which is what I'd have to do.
>>
>> Any chance we can make COGL (on Linux) always return a value based
>> on CLOCK_MONOTONIC? We can worry about other platforms at some other
>> time.
>
> It would be good to hear what you think about having a
> cogl_gst_frame_info_get_presentation_time() instead.
>
> Although promising CLOCK_MONOTONIC clarifies the cross-platform issues
> when compared to g_get_monotonic_time() if there are any linux drivers
> that use CLOCK_MONOTONIC_RAW, for example, this could go quite badly
> wrong. At least if we constrain the points where we do offset mapping
> to times when we really need it (such as for a/v synchronization) then
> we minimize the impact if the mapping can't be done accurately or if
> it breaks monotonicity.
>
>>
>> In terms of nanoseconds vs. microseconds - don't care too much.
>>
>>>  Ok so assuming that the baseline is with my proposed patches sent to
>>> the list so far applied on top of your original patches, I currently
>>> think these are the next steps:
>>>
>>> - Rename from SwapInfo to FrameInfo - since you pointed out that
>>> "frame" is more in line with gtk and "swap" isn't really meaningful if
>>> we have use cases for getting information before swapping (I have a
>>> patch for this I can send out)
>>> - cogl_frame_info_get_refresh_interval should be added back - since
>>> you pointed out some platforms may not let us associate an output with
>>> the frame info (I have a patch for this)
>>> - Rework the UST mapping to only map into nanoseconds and not attempt
>>> any mapping if we haven't identified the time source. (I have a patch
>>> for this)
>>> - Rework cogl_onscreen_add_swap_complete_callback to be named
>>> cogl_onscreen_add_frame_callback as discussed above and update the
>>> compatibility shim for cogl_onscreen_add_swap_buffers_callback (I have
>>> a patch for this)
>>> - Write some good gtk-doc documentation for the cogl_output api since
>>> it will almost certainly be made public (It would be good if you could
>>> look at this if possible?)
>>> - Review the cogl-output.c code, since your original patches didn't
>>> include the implementation of CoglOutput on the header
>>
>> I'll go though your patches and review them and write docs for CoglOutut
>> - I pushed a branch adding the missing
>
> thanks.
>
> It seems like the main issue we still have some disagreements about is
> the g_get_monotonic_time() mapping, but I don't think this has to
> block landing anything at this stage.
>
> Adding cogl_gst_frame_info_get_presentation_time() or
> cogl_glib_frame_info_get_presentation_time() functions are two
> potential solutions. Before the 1.14 release there is also even the
> possibility of committing to stricter time line guarantees which
> wouldn't be incompatible with more conservative scale guarantees to
> start with.
>
> kind regards,
> - Robert
>
>>
>> - Owen
>>
>>