[RFC v2] Wayland presentation extension (video protocol)
jason at jlekstrand.net
Sat Feb 8 13:23:29 PST 2014
First off, I think you've done a great job over-all. I think it will both
cover most cases and work well I've got a few comments below.
On Thu, Jan 30, 2014 at 9:35 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> it's time for a take two on the Wayland presentation extension.
> 1. Introduction
> The v1 proposal is here:
> In v2 the basic idea is the same: you can queue frames with a
> target presentation time, and you can get accurate presentation
> feedback. All the details are new, though. The re-design started
> from the wish to handle resizing better, preferably without
> clearing the buffer queue.
> All the changed details are probably too much to describe here,
> so it is maybe better to look at this as a new proposal. It
> still does build on Frederic's work, and everyone who commented
> on it. Special thanks to Axel Davy for his counter-proposal and
> fighting with me on IRC. :-)
> Some highlights:
> - Accurate presentation feedback is possible also without
> - You can queue also EGL-based rendering, and get presentation
> feedback if you want. Also EGL can do this internally, too, as
> long as EGL and the app do not try to use queueing at the same time.
> - More detailed presentation feedback to better allow predicting
> future display refreshes.
> - If wl_viewport is used, neither video resolution changes nor
> surface (window) size changes alone require clearing the queue.
> Video can continue playing even during resizes.
> The protocol interfaces are arranged as
> global.method(wl_surface, ...)
> just for brewity. We could as well do the factory approach:
> o = global.get_presentation(wl_surface)
> Or if we wanted to make it a mandatory part of the Wayland core
> protocol, we could just extend wl_surface itself:
> and put the clock_id event in wl_compositor. That all is still
> open and fairly uninteresting, so let's concentrate on the other
> The proposal refers to wl_viewport.set_source and
> wl_viewport.destination requests, which do not yet exist in the
> scaler protocol extension. These are just the wl_viewport.set
> arguments split into separate src and dst requests.
> Here is the new proposal, some design rationale follows. Please,
> do ask why something is designed like it is if it puzzles you. I
> have a load of notes I couldn't clean up for this email. This
> does not even intend to completely solve all XWayland needs, but
> for everything native on Wayland I hope it is sufficient.
> 2. The protocol specification
> <?xml version="1.0" encoding="UTF-8"?>
> <protocol name="presentation_timing">
> Copyright © 2013-2014 Collabora, Ltd.
> Permission to use, copy, modify, distribute, and sell this
> software and its documentation for any purpose is hereby granted
> without fee, provided that the above copyright notice appear in
> all copies and that both that copyright notice and this permission
> notice appear in supporting documentation, and that the name of
> the copyright holders not be used in advertising or publicity
> pertaining to distribution of the software without specific,
> written prior permission. The copyright holders make no
> representations about the suitability of this software for any
> purpose. It is provided "as is" without express or implied
> THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
> SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
> FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
> SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
> AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
> ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
> THIS SOFTWARE.
> <interface name="presentation" version="1">
> <description summary="timed presentation related wl_surface requests">
> The main features of this interface are accurate presentation
> timing feedback, and queued wl_surface content updates to ensure
> smooth video playback while maintaining audio/video
> synchronization. Some features use the concept of a presentation
> clock, which is defined in presentation.clock_id event.
> Requests 'feedback' and 'queue' can be regarded as additional
> wl_surface methods. They are part of the double-buffered
> surface state update mechanism, where other requests first set
> up the state and then wl_surface.commit atomically applies the
> state into use. In other words, wl_surface.commit submits a
> content update.
> Interface wl_surface has requests to set surface related state
> and buffer related state, because there is no separate interface
> for buffer state alone. Queueing requires separating the surface
> from buffer state, and buffer state can be queued while surface
> state cannot.
> Buffer state includes the wl_buffer from wl_surface.attach, the
> state assigned by wl_surface requests frame,
> set_buffer_transform and set_buffer_scale, and any
> buffer-related state from extensions, for instance
> wl_viewport.set_source. This state is inherent to the buffer
> and the content update, rather than the surface.
> Surface state includes all other state associated with
> wl_surfaces, like the x,y arguments of wl_surface.attach, input
> and opaque regions, damage, and extension state like
> wl_viewport.destination. In general, anything expressed in
> surface local coordinates is better as surface state.
> The standard way of posting new content to a surface using the
> wl_surface requests damage, attach, and commit is called
> immediate content submission. This happens when a
> presentation.queue request has not been sent since the last
> The new way of posting a content update is a queued content
> update submission. This happens on a wl_surface.commit when a
> presentation.queue request has been sent since the last
> Queued content updates do not get applied immediately in the
> compositor but are pushed to a queue on receiving the
> wl_surface.commit. The queue is ordered by the submission target
> timestamp. Each item in the queue contains the wl_buffer, the
> target timestamp, and all the buffer state as defined above. All
> the queued state is taken from the pending wl_surface state at
> the time of the commit, exactly like an immediate commit would
> have taken it.
> For instance on a queueing commit, the pending buffer is queued
> and no buffer is pending afterwards. The stored values of the
> x,y parameters of wl_surface.attach are reset to zero, but they
> also are not queued; queued content updates do not carry the
> attach offsets. All other surface state (that is not queued),
> e.g. damage, is not applied nor reset.
> Issuing a queueing commit without a wl_surface.attach is
> undefined. However, queueing a commit with explicitly attached
> NULL wl_buffer works; when and if the content update is
> executed, the surface content is removed as defined for
> If a queued content update has been submitted, and the wl_buffer
> used in the update is destroyed before the wl_buffer.release
> event, the results are undefined. The compositor may or may not
> have executed the update, therefore the surface contents become
> undefined as explained in wl_surface.attach. Whether any
> presentation feedback or frame callbacks occur is undefined.
> For each surface, the compositor maintains an association to a
> single output that is considered as the main output for the
> surface. Queued content updates are synchronized to the
> surface's main output, to provide a consistent and meaningful
> definition of the moment the update is displayed to the user.
> When a compositor updates an output, it processes only the
> queues of the surfaces whose main output is the one being
> updated. The queues of other surfaces, even if they are part of
> the redrawing, are not processed at that time.
> When a compositor chooses to update an output, it must predict
> the presentation clock value when the display update will occur.
> For the definition of the moment of display update, see
> presentation_feedback.presented. Therefore if the prediction is
> absolutely perfect, presentation_feedback.presented will carry
> the same clock value.
> For each surface with queued content updates and matching main
> output, the compositor picks the update with the highest
> timestamp no later than a half frame period after the predicted
> presentation time. The intent is to pick the content update
> whose target timestamp as rounded to the output refresh period
> granularity matches the same display update as the compositor is
> targeting, while not displaying any content update more than a
I'm not really following 100% here. It's not your fault, this is just a
terribly awkward sort of thing to try and put into English. It sounds to
me like the following: If P0 is the time of the next present and P1 is the
time of the one after that, you look for the largest thing less than the
average of P1 and P2. Is this correct? Why go for the average? The
client is going to have to adjust anyway.
> half frame period too early. If all the updates in the queue are
> already late, the highest timestamp update is taken regardless
> of how late it is. Once an update in a queue has been chosen,
> all remaining updates with an earlier timestamp in the queue are
Ok, I think what you are saying works. Again, it's difficult to parse but
these things always are.
> The compositor applies the chosen update to the wl_surface,
> regardless of possible wl_subsurface.set_sync mode. This allows
> e.g. a video to continue running in a sub-surface also during
> window resizing. It is assumed that buffer state updates do not
> cause visual disruption to the window like surface state updates
> can. Support for wl_viewport is needed for glitch-free resizing
> if the resizing involves changing the (sub-)surface size.
> When the chosen update is applied, the associated frame
> callbacks are sent. Damage for the whole surface is assumed,
> as damage is not explicitly queued with buffer state.
> When the final realized presentation time is available, e.g.
> after a framebuffer flip completes, the requested
> presentation_feedback.presented events are sent. The final
> presentation time can differ from the compositor's predicted
> display update time and the update's target time, especially
> when the compositor misses its target vertical blanking period.
> When updates from the queue are discarded, the
> presentation_feedback.discarded event is delivered if feedback
> was requested. Also the associated frame callbacks are sent.
> An immediate content update with an attach request automatically
> discards the whole queue just before the update gets applied. If
> wl_surface.attach has not been sent for an immediate content
> submission, the queue is not discarded, and the content update
> applies only the surface state, but no buffer state.
> If a wl_surface has queued content updates when it is destroyed,
> the whole queue is implicitly discarded as if
> presentation.discard_queue was sent immediately prior to the
> destroy request.
> <request name="destroy" type="destructor">
> <description summary="unbind from the presentation interface">
> Informs the server that the client will not be using this
> protocol object anymore. This does not affect any content
> update queues nor existing objects created by this interface.
> <request name="feedback">
> <description summary="request presentation feedback information">
> With this request, presentation feedback will be provided for
> the current content submission of the given surface. A new
> presentation_feedback object is created, and that object will
> deliver the information once. The object is tied to this
> content submission only. Multiple presentation_feedback objects
> may be created for the same submission, and they will all
> return the same information.
> For details on what information is returned, see
> presentation_feedback interface.
> <arg name="surface" type="object" interface="wl_surface"
> summary="target surface"/>
> <arg name="callback" type="new_id" interface="presentation_feedback"
> summary="new feedback object"/>
> <request name="queue">
> <description summary="queue the buffer instead of immediate
> This request changes the behaviour of the very next
> wl_surface.commit of the given wl_surface and that commit
> only. Instead of immediately applying the pending wl_surface
> state as defined in wl_surface.commit, the commit will queue a
> new content update, using the pending buffer state only.
> For a more detailed description and what is buffer state, see
> the documentation for presentation interface.
> The value of the target timestamp is in the presentation clock
> domain, see presentation.clock_id.
> If queue request has already been sent for the unfinished content
> update submission on the given wl_surface, a new queue request
> will override the previous one.
> <arg name="surface" type="object" interface="wl_surface"
> summary="target surface"/>
> <arg name="tv_sec" type="uint"
> summary="seconds part of the target timestamp"/>
> <arg name="tv_nsec" type="uint"
> summary="nanoseconds part of the target timestamp"/>
> <request name="discard_queue">
> <description summary="discard the whole queue of the given surface">
> This request discards the whole remaining content update queue
> of the given wl_surface. Once the compositor has processed
> this request, no more queued updates will happen on the
> surface until the client queues new updates. Discard_queue is
> processed immediately when the compositor dispatches the
> A client can issue a wl_display.sync after this, and once the
> sync returns, the client has received all
> presentation_feedback.discarded events resulting from the
> discard_queue. However, presentation_feedback.presented
> events may arrive later if the compositor executed a queued
> content update before the discard_queue.
> <arg name="surface" type="object" interface="wl_surface"
> summary="target surface"/>
> <event name="clock_id">
> <description summary="clock ID for timestamps">
> This event tells the client, in which clock domain the
> compositor interprets the timestamps used by the presentation
> extension. This clock is called the presentation clock.
> The compositor sends this event when the client binds to the
> presentation interface. The presentation clock does not change
> during the lifetime of the client connection.
> The clock identifier is platform dependent. Clients must be
> able to query the current clock value directly, not by asking
> the compositor.
> On Linux/glibc, the identifer value is one of the clockid_t
> values accepted by clock_gettime(). clock_gettime() is defined
> by POSIX.1-2001.
> Compositors should prefer a clock, which does not jump and is
> not slewed e.g. by NTP. The absolute value of the clock is
> irrelevant. Precision of one millisecond or better is
> Timestamps in this clock domain are expressed as tv_sec,
> tv_nsec pairs, each component being an unsigned 32-bit value.
> Whole seconds are in tv_sec, and the additional fractional
> part in tv_nsec as nanoseconds. Hence, for valid timestamps
> tv_nsec must be in [0, 999999999].
> Note, that clock_id applies only to the presentation clock,
> and implies nothing about e.g. the timestamps used in the
> Wayland core protocol input events.
> <arg name="clk_id" type="uint" summary="platform clock identifier"/>
> <interface name="presentation_feedback" version="1">
> <description summary="presentation time feedback event">
> A presentation_feedback object returns the feedback information
> about a wl_surface content update becoming visible to the user.
> One object corresponds to one content update submission
> (wl_surface.commit), queued or immediate. There are two possible
> outcomes: the content update may be presented to the user, in
> which case the presentation timestamp is delivered. Otherwise,
> the content update is discarded, and the user never had a chance
> to see it before it was superseded or the surface was destroyed.
> Once a presentation_feedback object has delivered an event, it
> becomes inert, and should be destroyed by the client.
> <request name="destroy" type="destructor">
> <description summary="destroy presentation feedback object">
> The object is destroyed. If a feedback event had not been
> delivered yet, it is cancelled.
> <event name="sync_output">
> <description summary="presentation synchronized to this output">
> As presentation can be synchronized to only one output at a
> time, this event tells which output it was. This event is only
> sent prior to the presented event.
> As clients may bind to the same global wl_output multiple
> times, this event is sent for each bound instance that matches
> the synchronized output. If a client has not bound to the
> right wl_output global at all, this event is not sent.
> <arg name="output" type="object" interface="wl_output"
> summary="presentation output"/>
> <event name="presented">
> <description summary="the content update was displayed">
> The associated content update was displayed to the user at the
> indicated time (tv_sec, tv_nsec). For the interpretation of the
> timestamp, see presentation.clock_id event.
> The timestamp corresponds to the time when the content update
> turned into light the first time on the surface's main output.
> Compositors may approximate this from the framebuffer flip
> completion events from the system, and the latency of the
> physical display path if known.
> This event is preceeded by all related sync_output events
> telling which output's refresh cycle the feedback corresponds
> to, i.e. the main output for the surface. Compositors are
> recommended to choose to the output containing the largest
> part of the wl_surface, or keeping the output they previously
> chose. Having a stable presentation output association helps
> clients to predict future output refreshes (vblank).
> Argument 'refresh' gives the compositor's prediction of how
> many nanoseconds after tv_sec, tv_nsec the very next output
> refresh may occur. This is to further aid clients in
> predicting future refreshes, i.e., estimating the timestamps
> targeting the next few vblanks. If such prediction cannot
> usefully be done, the argument is zero.
> The 64-bit value combined from seq_hi and seq_lo is the value
> of the output's vertical retrace counter when the content
> update was first scanned out to the display. This value must
> be compatible with the definition of MSC in
> GLX_OML_sync_control specification. Note, that if the display
> path has a non-zero latency, the time instant specified by
> this counter may differ from the timestamp's.
> If the output does not have a constant refresh rate, explicit
> video mode switches excluded, then the refresh argument must
> be zero.
> If the output does not have a concept of vertical retrace or a
> refresh cycle, or the output device is self-refreshing without
> a way to query the refresh count, then the arguments seq_hi
> and seq_lo must be zero.
> <arg name="tv_sec" type="uint"
> summary="seconds part of the presentation timestamp"/>
> <arg name="tv_nsec" type="uint"
> summary="nanoseconds part of the presentation timestamp"/>
> <arg name="refresh" type="uint" summary="nanoseconds till next
> <arg name="seq_hi" type="uint"
> summary="high 32 bits of refresh counter"/>
> <arg name="seq_lo" type="uint"
> summary="low 32 bits of refresh counter"/>
> <event name="discarded">
> <description summary="the content update was not displayed">
> The content update was never displayed to the user.
> 3. Why UST all the way?
> 3.1. UST and MSC pros and cons
> Unadjusted System Time (UST) and graphics Media Stream Counter
> (MSC) are defined by the GLX_OML_sync_control specification. UST
> is basically a stable wall clock with a tick rate close to the
> "universal true time", i.e. the real time, while MSC is a frame
> or refresh cycle counter and not a clock.
> Should we use UST or MSC, or maybe allow both, for queueing
> buffers and measuring the presentation time?
> MSC pro: Is what all graphics systems that I know of seem to be
> using currently, and matches how past and most of current
> hardware works.
> UST con: Would be a new concept to adapt to for frame counter
> based algorithms. For actual hardware operations, needs to be
> converted to MSC in most cases.
> MSC con: tick rate depends on the output device currently in use
> for the window, and can also change with video mode switches.
> UST pro: tick rate is guaranteed to be constant.
> MSC con: for an output device that is not based on periodic
> refresh cycles, e.g. on-demand refresh or variable rate, it has
> no predictable correspondence to the "universal true time".
> UST pro: once you establish the relatioship to "universal true
> time", it holds practically indefinitely. This means you can
> reliably relate UST to other proper clocks and maintain e.g.
> audio/video sync.
> MSC pro: corresponds exactly to when a monitor refreshes,
> provided that the monitor uses periodic refreshing. The MSC
> increment between two consecutive vblanks is 1.
> UST con: to hit a vblank, you have to estimate and compute the
> right UST value based on feedback data. The UST increment
> between two consecutive vblanks may not be an integer
> MSC con: an application cannot read the current MSC value on its
> own, it needs to ask the display server about it, which is a
> protocol roundtrip to another process. Determining current MSC
> also involves determining which window or output it should
> relate to.
> UST pro: an application can query the current UST value directly
> with a system call (clock_gettime).
> 3.2. Conclusions:
> MSC alone cannot be used to achieve reliable A/V sync,
> because its relationship to any other clock (e.g. audio clock)
> can change: MSC may suddenly start to tick faster or slower.
> Only UST is reliably synchronizable to other clocks, therefore
> UST should be the "common language" in the Wayland protocol.
> Using MSC would always require some context, like the window or
> Clients need to estimate the UST values when vblanks
> happen, so that they can schedule presentation for certain
> monitor refresh cycles and to reduce jitter and latency. Also
> compositors need to convert presentation UST timestamps into
> hardware MSC values or equivalent, since the update can only
> happen during vblanks. However, this applies only to periodic
> scanout style hardware, and not to variable refresh rate or
> on-demand monitors. Therefore UST, while being more complicated
> to use, is the more future-proof concept.
> If the compositor can affect when the monitor is
> refreshed, using an MSC as presentation target time would not
> give any clue of when the presentation should actually happen,
> making variable refresh rate display hardware have basically no
> 4. Bits and pieces
> 4.1. Damage
> Do not queue damage, because damage is in surface coordinates,
> and syncing it from the queue is hard. Do not queue anything
> that is in surface coordinates. Doing so would require
> discarding the whole queue whenever the surface size changes.
> The design explicitly allows changing surface and buffer sizes
> asynchronously, if wl_viewport is available.
> 4.2. Moment of presentation
> Presentation time is defined as "turns into light", because
> modern TVs may have significant latency before the pixel going
> into wire and received by the TV turns into light. This should
> not be conflicting with the definition in GLX_OML_sync_control,
> which was presumably written on the era of CRT monitors and the
> latency for turn-to-light was insignificant.
> 4.3. Swap buffer count
> SBC does not require any protocol nor server side support,
> because the client is in complete control of the swaps to the
> wl_surface or gets the needed feedback via presentation_feedback
> and frame callbacks, and no other client can access the same
> 4.4. Intentional tearing
> Implementing GLX_EXT_swap_control_tear would require the Wayland
> compositor to cause tearing on purpose. Hence it is not
> considered here.
> 4.5. The frame callback and swap interval
> The frame callback needs to be with the buffer state, so it gets
> queued. If a client makes e.g. EGL's commits queued, EGL may
> still rely on frame callbacks for blocking apps properly, and
> that is related to presenting the buffer, not just the very next
> output refresh. EGL may also internally use queueing and
> feedback to implement swap interval > 1.
Doesn't this mean that you need eglSwapInterval(0) if you're queueing?
This is probably the case anyway, but it might be worth noting explicitly.
I think what you're doing with frame callbacks is sane, but I'm not sure.
> 4.6. Interlacing
> Supporting interlaced material and displays is punted for a
> later extension. Presumably the protocol supporting interlaced
> content would be as simple as having an extra wl_surface-like
> request to say on which of the two fields the content should be
> displayed first. The field designation would be an additional
> restriction on when a content update should initially hit the
> screen. I.e. if both field and target timestamp are given, both
> conditions must pass. This means that giving a field may delay
> the presentation by one output refresh cycle, assuming the
> output scans out alternating fields. Additionally there should
> be an extension to inform the client, which field the top-most
> scanline of the buffer will hit, or equivalent information. This
> assumes that the even scanlines in a buffer correspond to one
> field, and the odd scanlines correspond to the other field,
> regardless of how these terms are defined.
> 5. X11 Present and XWayland
> Comparison between X11 Present and Wayland with presentation
> extension or rather how to map one to the other. This is
> supposed to provide some faith on how Present could be supported
> on XWayland.
> The fundamental difference between X11 Present and Wayland
> (without XWayland specific extensions) is that Present supports
> scheduled copy operations, which in the pathological cases
> cannot easily be done in advance. Wayland requires complete
> buffers, but Present may imply blits as a part of posting window
> content to display.
> The workhorse of X11 Present is the PresentPixmap request. Its
> arguments with their likely corresponding Wayland concepts are:
> - window: wl_surface
> - pixmap: wl_buffer?
> - serial: a new presentation_feedback object
> - valid-area: N/A, always assumed to be the whole buffer. If not
> the whole buffer, the X server must create a copy.
> - update-area: wl_surface.damage
> - x-off, y-off: N/A, non-zero values imply a copy
> - wait-fence: N/A, Wayland does not have an explicit rendering
> fence protocol. It is assumed that the underlying driver stack
> prevents reading from a buffer whose rendering is still on-going
> or pending.
> - idle-fence/PresentIdleNotify: wl_buffer.release, but since
> wl_buffer.release is poorly suitable for hardware buffers
> (implies CPU/GPU synchronization), we may want real sync objects
> that can be off-loaded to GPUs, especially with dma-buf and
> - target-msc: presentation.queue with UST target time, requires
> computing the desired UST from the given MSC and historical
> feedback data
> - divisor: ignored
> - remainder: ignored
> - target-crtc: N/A
> - options:
> * PresentOptionAsync: ignored, presentation is always synced to
> * PresentOptionCopy: N/A, must be implemented as a copy in the X
> * PresentOptionUST: presentation.queue with UST target time
> PresentPixmap corresponds to wl_surface.commit.
> There are differences between Wayland and X11 when the target
> timestamp is already in the past when the server processes it.
> Unsure if queueing from XWayland xserver works out.
> PresentNotifyMSC would require an xwayland extension to get a callback
> event when the output's MSC reaches a certain value, or if the
> xwayland-xserver can estimate the UST from the target-msc it can simply
> use a timer. But probably wants the extension, really.
> - PresentCapabilityAsync is never true
> - PresentCapabilityFence is missing Wayland protocol
> - PresentCapabilityUST implies that the display can refresh on-demand;
> enable this always so that X11 apps would prefer UST timestamps instead
> of MSC? Or add xwayland extension to ask the compositor?
> PresentCompleteNotify as reply to PresentPixmap:
> both presentation_feedback.presented/discarded and wl_buffer.release to
> determine 'mode'
> - PresentCompleteModeSkip if discarded and release
> - PresentCompleteModeFlip if presented and no release
> - PresentCompleteModeCopy if presented and release
> - discarded and no release should not happen if xwayland does not submit
> the same wl_buffer to multiple surfaces or queue multiple times.
> triggered by wl_buffer.release with 'idle-fence' as None, or if using a
> fence, then wl_buffer.release signals the fence while PresentIdleNotify
> is delivered ahead of time.
> wayland-devel mailing list
> wayland-devel at lists.freedesktop.org
My one latent concern is that I still don't think we're entirely handling
the case that QtQuick wants. What they want is to do their rendering a few
frames in advance in case of CPU/GPU jitter. Technically, this extension
handles this by the client simply doing a good job of guessing presentation
times on a one-per-frame baseis. However, it doesn't allow for any damage
tracking. In the case of QtQuick they want a linear queue of buffers where
no buffer ever gets skipped. In this case, you could do damage tracking by
allowing it to accumulate from one frame to another and you get all of the
damage-tracking advantages that you had before. I'm not sure how much this
matters, but it might be worth thinking about it.
Hope that helps,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the wayland-devel