[RFC v2] Wayland presentation extension (video protocol)
ppaalanen at gmail.com
Thu Jan 30 07:35:17 PST 2014
it's time for a take two on the Wayland presentation extension.
The v1 proposal is here:
In v2 the basic idea is the same: you can queue frames with a
target presentation time, and you can get accurate presentation
feedback. All the details are new, though. The re-design started
from the wish to handle resizing better, preferably without
clearing the buffer queue.
All the changed details are probably too much to describe here,
so it is maybe better to look at this as a new proposal. It
still does build on Frederic's work, and everyone who commented
on it. Special thanks to Axel Davy for his counter-proposal and
fighting with me on IRC. :-)
- Accurate presentation feedback is possible also without
- You can queue also EGL-based rendering, and get presentation
feedback if you want. Also EGL can do this internally, too, as
long as EGL and the app do not try to use queueing at the same time.
- More detailed presentation feedback to better allow predicting
future display refreshes.
- If wl_viewport is used, neither video resolution changes nor
surface (window) size changes alone require clearing the queue.
Video can continue playing even during resizes.
The protocol interfaces are arranged as
just for brewity. We could as well do the factory approach:
o = global.get_presentation(wl_surface)
Or if we wanted to make it a mandatory part of the Wayland core
protocol, we could just extend wl_surface itself:
and put the clock_id event in wl_compositor. That all is still
open and fairly uninteresting, so let's concentrate on the other
The proposal refers to wl_viewport.set_source and
wl_viewport.destination requests, which do not yet exist in the
scaler protocol extension. These are just the wl_viewport.set
arguments split into separate src and dst requests.
Here is the new proposal, some design rationale follows. Please,
do ask why something is designed like it is if it puzzles you. I
have a load of notes I couldn't clean up for this email. This
does not even intend to completely solve all XWayland needs, but
for everything native on Wayland I hope it is sufficient.
2. The protocol specification
<?xml version="1.0" encoding="UTF-8"?>
Copyright © 2013-2014 Collabora, Ltd.
Permission to use, copy, modify, distribute, and sell this
software and its documentation for any purpose is hereby granted
without fee, provided that the above copyright notice appear in
all copies and that both that copyright notice and this permission
notice appear in supporting documentation, and that the name of
the copyright holders not be used in advertising or publicity
pertaining to distribution of the software without specific,
written prior permission. The copyright holders make no
representations about the suitability of this software for any
purpose. It is provided "as is" without express or implied
THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
<interface name="presentation" version="1">
<description summary="timed presentation related wl_surface requests">
The main features of this interface are accurate presentation
timing feedback, and queued wl_surface content updates to ensure
smooth video playback while maintaining audio/video
synchronization. Some features use the concept of a presentation
clock, which is defined in presentation.clock_id event.
Requests 'feedback' and 'queue' can be regarded as additional
wl_surface methods. They are part of the double-buffered
surface state update mechanism, where other requests first set
up the state and then wl_surface.commit atomically applies the
state into use. In other words, wl_surface.commit submits a
Interface wl_surface has requests to set surface related state
and buffer related state, because there is no separate interface
for buffer state alone. Queueing requires separating the surface
from buffer state, and buffer state can be queued while surface
Buffer state includes the wl_buffer from wl_surface.attach, the
state assigned by wl_surface requests frame,
set_buffer_transform and set_buffer_scale, and any
buffer-related state from extensions, for instance
wl_viewport.set_source. This state is inherent to the buffer
and the content update, rather than the surface.
Surface state includes all other state associated with
wl_surfaces, like the x,y arguments of wl_surface.attach, input
and opaque regions, damage, and extension state like
wl_viewport.destination. In general, anything expressed in
surface local coordinates is better as surface state.
The standard way of posting new content to a surface using the
wl_surface requests damage, attach, and commit is called
immediate content submission. This happens when a
presentation.queue request has not been sent since the last
The new way of posting a content update is a queued content
update submission. This happens on a wl_surface.commit when a
presentation.queue request has been sent since the last
Queued content updates do not get applied immediately in the
compositor but are pushed to a queue on receiving the
wl_surface.commit. The queue is ordered by the submission target
timestamp. Each item in the queue contains the wl_buffer, the
target timestamp, and all the buffer state as defined above. All
the queued state is taken from the pending wl_surface state at
the time of the commit, exactly like an immediate commit would
have taken it.
For instance on a queueing commit, the pending buffer is queued
and no buffer is pending afterwards. The stored values of the
x,y parameters of wl_surface.attach are reset to zero, but they
also are not queued; queued content updates do not carry the
attach offsets. All other surface state (that is not queued),
e.g. damage, is not applied nor reset.
Issuing a queueing commit without a wl_surface.attach is
undefined. However, queueing a commit with explicitly attached
NULL wl_buffer works; when and if the content update is
executed, the surface content is removed as defined for
If a queued content update has been submitted, and the wl_buffer
used in the update is destroyed before the wl_buffer.release
event, the results are undefined. The compositor may or may not
have executed the update, therefore the surface contents become
undefined as explained in wl_surface.attach. Whether any
presentation feedback or frame callbacks occur is undefined.
For each surface, the compositor maintains an association to a
single output that is considered as the main output for the
surface. Queued content updates are synchronized to the
surface's main output, to provide a consistent and meaningful
definition of the moment the update is displayed to the user.
When a compositor updates an output, it processes only the
queues of the surfaces whose main output is the one being
updated. The queues of other surfaces, even if they are part of
the redrawing, are not processed at that time.
When a compositor chooses to update an output, it must predict
the presentation clock value when the display update will occur.
For the definition of the moment of display update, see
presentation_feedback.presented. Therefore if the prediction is
absolutely perfect, presentation_feedback.presented will carry
the same clock value.
For each surface with queued content updates and matching main
output, the compositor picks the update with the highest
timestamp no later than a half frame period after the predicted
presentation time. The intent is to pick the content update
whose target timestamp as rounded to the output refresh period
granularity matches the same display update as the compositor is
targeting, while not displaying any content update more than a
half frame period too early. If all the updates in the queue are
already late, the highest timestamp update is taken regardless
of how late it is. Once an update in a queue has been chosen,
all remaining updates with an earlier timestamp in the queue are
The compositor applies the chosen update to the wl_surface,
regardless of possible wl_subsurface.set_sync mode. This allows
e.g. a video to continue running in a sub-surface also during
window resizing. It is assumed that buffer state updates do not
cause visual disruption to the window like surface state updates
can. Support for wl_viewport is needed for glitch-free resizing
if the resizing involves changing the (sub-)surface size.
When the chosen update is applied, the associated frame
callbacks are sent. Damage for the whole surface is assumed,
as damage is not explicitly queued with buffer state.
When the final realized presentation time is available, e.g.
after a framebuffer flip completes, the requested
presentation_feedback.presented events are sent. The final
presentation time can differ from the compositor's predicted
display update time and the update's target time, especially
when the compositor misses its target vertical blanking period.
When updates from the queue are discarded, the
presentation_feedback.discarded event is delivered if feedback
was requested. Also the associated frame callbacks are sent.
An immediate content update with an attach request automatically
discards the whole queue just before the update gets applied. If
wl_surface.attach has not been sent for an immediate content
submission, the queue is not discarded, and the content update
applies only the surface state, but no buffer state.
If a wl_surface has queued content updates when it is destroyed,
the whole queue is implicitly discarded as if
presentation.discard_queue was sent immediately prior to the
<request name="destroy" type="destructor">
<description summary="unbind from the presentation interface">
Informs the server that the client will not be using this
protocol object anymore. This does not affect any content
update queues nor existing objects created by this interface.
<description summary="request presentation feedback information">
With this request, presentation feedback will be provided for
the current content submission of the given surface. A new
presentation_feedback object is created, and that object will
deliver the information once. The object is tied to this
content submission only. Multiple presentation_feedback objects
may be created for the same submission, and they will all
return the same information.
For details on what information is returned, see
<arg name="surface" type="object" interface="wl_surface"
<arg name="callback" type="new_id" interface="presentation_feedback"
summary="new feedback object"/>
<description summary="queue the buffer instead of immediate presentation">
This request changes the behaviour of the very next
wl_surface.commit of the given wl_surface and that commit
only. Instead of immediately applying the pending wl_surface
state as defined in wl_surface.commit, the commit will queue a
new content update, using the pending buffer state only.
For a more detailed description and what is buffer state, see
the documentation for presentation interface.
The value of the target timestamp is in the presentation clock
domain, see presentation.clock_id.
If queue request has already been sent for the unfinished content
update submission on the given wl_surface, a new queue request
will override the previous one.
<arg name="surface" type="object" interface="wl_surface"
<arg name="tv_sec" type="uint"
summary="seconds part of the target timestamp"/>
<arg name="tv_nsec" type="uint"
summary="nanoseconds part of the target timestamp"/>
<description summary="discard the whole queue of the given surface">
This request discards the whole remaining content update queue
of the given wl_surface. Once the compositor has processed
this request, no more queued updates will happen on the
surface until the client queues new updates. Discard_queue is
processed immediately when the compositor dispatches the
A client can issue a wl_display.sync after this, and once the
sync returns, the client has received all
presentation_feedback.discarded events resulting from the
discard_queue. However, presentation_feedback.presented
events may arrive later if the compositor executed a queued
content update before the discard_queue.
<arg name="surface" type="object" interface="wl_surface"
<description summary="clock ID for timestamps">
This event tells the client, in which clock domain the
compositor interprets the timestamps used by the presentation
extension. This clock is called the presentation clock.
The compositor sends this event when the client binds to the
presentation interface. The presentation clock does not change
during the lifetime of the client connection.
The clock identifier is platform dependent. Clients must be
able to query the current clock value directly, not by asking
On Linux/glibc, the identifer value is one of the clockid_t
values accepted by clock_gettime(). clock_gettime() is defined
Compositors should prefer a clock, which does not jump and is
not slewed e.g. by NTP. The absolute value of the clock is
irrelevant. Precision of one millisecond or better is
Timestamps in this clock domain are expressed as tv_sec,
tv_nsec pairs, each component being an unsigned 32-bit value.
Whole seconds are in tv_sec, and the additional fractional
part in tv_nsec as nanoseconds. Hence, for valid timestamps
tv_nsec must be in [0, 999999999].
Note, that clock_id applies only to the presentation clock,
and implies nothing about e.g. the timestamps used in the
Wayland core protocol input events.
<arg name="clk_id" type="uint" summary="platform clock identifier"/>
<interface name="presentation_feedback" version="1">
<description summary="presentation time feedback event">
A presentation_feedback object returns the feedback information
about a wl_surface content update becoming visible to the user.
One object corresponds to one content update submission
(wl_surface.commit), queued or immediate. There are two possible
outcomes: the content update may be presented to the user, in
which case the presentation timestamp is delivered. Otherwise,
the content update is discarded, and the user never had a chance
to see it before it was superseded or the surface was destroyed.
Once a presentation_feedback object has delivered an event, it
becomes inert, and should be destroyed by the client.
<request name="destroy" type="destructor">
<description summary="destroy presentation feedback object">
The object is destroyed. If a feedback event had not been
delivered yet, it is cancelled.
<description summary="presentation synchronized to this output">
As presentation can be synchronized to only one output at a
time, this event tells which output it was. This event is only
sent prior to the presented event.
As clients may bind to the same global wl_output multiple
times, this event is sent for each bound instance that matches
the synchronized output. If a client has not bound to the
right wl_output global at all, this event is not sent.
<arg name="output" type="object" interface="wl_output"
<description summary="the content update was displayed">
The associated content update was displayed to the user at the
indicated time (tv_sec, tv_nsec). For the interpretation of the
timestamp, see presentation.clock_id event.
The timestamp corresponds to the time when the content update
turned into light the first time on the surface's main output.
Compositors may approximate this from the framebuffer flip
completion events from the system, and the latency of the
physical display path if known.
This event is preceeded by all related sync_output events
telling which output's refresh cycle the feedback corresponds
to, i.e. the main output for the surface. Compositors are
recommended to choose to the output containing the largest
part of the wl_surface, or keeping the output they previously
chose. Having a stable presentation output association helps
clients to predict future output refreshes (vblank).
Argument 'refresh' gives the compositor's prediction of how
many nanoseconds after tv_sec, tv_nsec the very next output
refresh may occur. This is to further aid clients in
predicting future refreshes, i.e., estimating the timestamps
targeting the next few vblanks. If such prediction cannot
usefully be done, the argument is zero.
The 64-bit value combined from seq_hi and seq_lo is the value
of the output's vertical retrace counter when the content
update was first scanned out to the display. This value must
be compatible with the definition of MSC in
GLX_OML_sync_control specification. Note, that if the display
path has a non-zero latency, the time instant specified by
this counter may differ from the timestamp's.
If the output does not have a constant refresh rate, explicit
video mode switches excluded, then the refresh argument must
If the output does not have a concept of vertical retrace or a
refresh cycle, or the output device is self-refreshing without
a way to query the refresh count, then the arguments seq_hi
and seq_lo must be zero.
<arg name="tv_sec" type="uint"
summary="seconds part of the presentation timestamp"/>
<arg name="tv_nsec" type="uint"
summary="nanoseconds part of the presentation timestamp"/>
<arg name="refresh" type="uint" summary="nanoseconds till next refresh"/>
<arg name="seq_hi" type="uint"
summary="high 32 bits of refresh counter"/>
<arg name="seq_lo" type="uint"
summary="low 32 bits of refresh counter"/>
<description summary="the content update was not displayed">
The content update was never displayed to the user.
3. Why UST all the way?
3.1. UST and MSC pros and cons
Unadjusted System Time (UST) and graphics Media Stream Counter
(MSC) are defined by the GLX_OML_sync_control specification. UST
is basically a stable wall clock with a tick rate close to the
"universal true time", i.e. the real time, while MSC is a frame
or refresh cycle counter and not a clock.
Should we use UST or MSC, or maybe allow both, for queueing
buffers and measuring the presentation time?
MSC pro: Is what all graphics systems that I know of seem to be
using currently, and matches how past and most of current
UST con: Would be a new concept to adapt to for frame counter
based algorithms. For actual hardware operations, needs to be
converted to MSC in most cases.
MSC con: tick rate depends on the output device currently in use
for the window, and can also change with video mode switches.
UST pro: tick rate is guaranteed to be constant.
MSC con: for an output device that is not based on periodic
refresh cycles, e.g. on-demand refresh or variable rate, it has
no predictable correspondence to the "universal true time".
UST pro: once you establish the relatioship to "universal true
time", it holds practically indefinitely. This means you can
reliably relate UST to other proper clocks and maintain e.g.
MSC pro: corresponds exactly to when a monitor refreshes,
provided that the monitor uses periodic refreshing. The MSC
increment between two consecutive vblanks is 1.
UST con: to hit a vblank, you have to estimate and compute the
right UST value based on feedback data. The UST increment
between two consecutive vblanks may not be an integer
MSC con: an application cannot read the current MSC value on its
own, it needs to ask the display server about it, which is a
protocol roundtrip to another process. Determining current MSC
also involves determining which window or output it should
UST pro: an application can query the current UST value directly
with a system call (clock_gettime).
MSC alone cannot be used to achieve reliable A/V sync,
because its relationship to any other clock (e.g. audio clock)
can change: MSC may suddenly start to tick faster or slower.
Only UST is reliably synchronizable to other clocks, therefore
UST should be the "common language" in the Wayland protocol.
Using MSC would always require some context, like the window or
Clients need to estimate the UST values when vblanks
happen, so that they can schedule presentation for certain
monitor refresh cycles and to reduce jitter and latency. Also
compositors need to convert presentation UST timestamps into
hardware MSC values or equivalent, since the update can only
happen during vblanks. However, this applies only to periodic
scanout style hardware, and not to variable refresh rate or
on-demand monitors. Therefore UST, while being more complicated
to use, is the more future-proof concept.
If the compositor can affect when the monitor is
refreshed, using an MSC as presentation target time would not
give any clue of when the presentation should actually happen,
making variable refresh rate display hardware have basically no
4. Bits and pieces
Do not queue damage, because damage is in surface coordinates,
and syncing it from the queue is hard. Do not queue anything
that is in surface coordinates. Doing so would require
discarding the whole queue whenever the surface size changes.
The design explicitly allows changing surface and buffer sizes
asynchronously, if wl_viewport is available.
4.2. Moment of presentation
Presentation time is defined as "turns into light", because
modern TVs may have significant latency before the pixel going
into wire and received by the TV turns into light. This should
not be conflicting with the definition in GLX_OML_sync_control,
which was presumably written on the era of CRT monitors and the
latency for turn-to-light was insignificant.
4.3. Swap buffer count
SBC does not require any protocol nor server side support,
because the client is in complete control of the swaps to the
wl_surface or gets the needed feedback via presentation_feedback
and frame callbacks, and no other client can access the same
4.4. Intentional tearing
Implementing GLX_EXT_swap_control_tear would require the Wayland
compositor to cause tearing on purpose. Hence it is not
4.5. The frame callback and swap interval
The frame callback needs to be with the buffer state, so it gets
queued. If a client makes e.g. EGL's commits queued, EGL may
still rely on frame callbacks for blocking apps properly, and
that is related to presenting the buffer, not just the very next
output refresh. EGL may also internally use queueing and
feedback to implement swap interval > 1.
Supporting interlaced material and displays is punted for a
later extension. Presumably the protocol supporting interlaced
content would be as simple as having an extra wl_surface-like
request to say on which of the two fields the content should be
displayed first. The field designation would be an additional
restriction on when a content update should initially hit the
screen. I.e. if both field and target timestamp are given, both
conditions must pass. This means that giving a field may delay
the presentation by one output refresh cycle, assuming the
output scans out alternating fields. Additionally there should
be an extension to inform the client, which field the top-most
scanline of the buffer will hit, or equivalent information. This
assumes that the even scanlines in a buffer correspond to one
field, and the odd scanlines correspond to the other field,
regardless of how these terms are defined.
5. X11 Present and XWayland
Comparison between X11 Present and Wayland with presentation
extension or rather how to map one to the other. This is
supposed to provide some faith on how Present could be supported
The fundamental difference between X11 Present and Wayland
(without XWayland specific extensions) is that Present supports
scheduled copy operations, which in the pathological cases
cannot easily be done in advance. Wayland requires complete
buffers, but Present may imply blits as a part of posting window
content to display.
The workhorse of X11 Present is the PresentPixmap request. Its
arguments with their likely corresponding Wayland concepts are:
- window: wl_surface
- pixmap: wl_buffer?
- serial: a new presentation_feedback object
- valid-area: N/A, always assumed to be the whole buffer. If not
the whole buffer, the X server must create a copy.
- update-area: wl_surface.damage
- x-off, y-off: N/A, non-zero values imply a copy
- wait-fence: N/A, Wayland does not have an explicit rendering
fence protocol. It is assumed that the underlying driver stack
prevents reading from a buffer whose rendering is still on-going
- idle-fence/PresentIdleNotify: wl_buffer.release, but since
wl_buffer.release is poorly suitable for hardware buffers
(implies CPU/GPU synchronization), we may want real sync objects
that can be off-loaded to GPUs, especially with dma-buf and
- target-msc: presentation.queue with UST target time, requires
computing the desired UST from the given MSC and historical
- divisor: ignored
- remainder: ignored
- target-crtc: N/A
* PresentOptionAsync: ignored, presentation is always synced to
* PresentOptionCopy: N/A, must be implemented as a copy in the X
* PresentOptionUST: presentation.queue with UST target time
PresentPixmap corresponds to wl_surface.commit.
There are differences between Wayland and X11 when the target
timestamp is already in the past when the server processes it.
Unsure if queueing from XWayland xserver works out.
PresentNotifyMSC would require an xwayland extension to get a callback
event when the output's MSC reaches a certain value, or if the
xwayland-xserver can estimate the UST from the target-msc it can simply
use a timer. But probably wants the extension, really.
- PresentCapabilityAsync is never true
- PresentCapabilityFence is missing Wayland protocol
- PresentCapabilityUST implies that the display can refresh on-demand;
enable this always so that X11 apps would prefer UST timestamps instead
of MSC? Or add xwayland extension to ask the compositor?
PresentCompleteNotify as reply to PresentPixmap:
both presentation_feedback.presented/discarded and wl_buffer.release to
- PresentCompleteModeSkip if discarded and release
- PresentCompleteModeFlip if presented and no release
- PresentCompleteModeCopy if presented and release
- discarded and no release should not happen if xwayland does not submit
the same wl_buffer to multiple surfaces or queue multiple times.
triggered by wl_buffer.release with 'idle-fence' as None, or if using a
fence, then wl_buffer.release signals the fence while PresentIdleNotify
is delivered ahead of time.
More information about the wayland-devel