Core protocol change; [RFC v2] Wayland presentation extension

Tue Feb 18 02:09:42 PST 2014

Hi Jason,

thanks for the reply. I think I wrote more like a flamebait than a
solution there. After sleeping over it and discussing with Axel, I think
changing the existing core protocol behaviour is not appropriate, and
we need to find another way around this.

More below.

On Mon, 17 Feb 2014 10:30:54 -0600
Jason Ekstrand <jason at jlekstrand.net> wrote:

> On Feb 17, 2014 2:35 AM, "Pekka Paalanen" <ppaalanen at gmail.com> wrote:
> >
> > Hi,
> >
> > there is one important thing in the below spec I really need to
> > highlight! See further below.
> >
> >
> > On Thu, 30 Jan 2014 17:35:17 +0200
> > Pekka Paalanen <ppaalanen at gmail.com> wrote:
> >
> > > Hi,
> > >
> > > it's time for a take two on the Wayland presentation extension.
> > >
> > >
> > >               1. Introduction
> > >
> > > The v1 proposal is here:
> > >
> http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html
> > >
> > > In v2 the basic idea is the same: you can queue frames with a
> > > target presentation time, and you can get accurate presentation
> > > feedback. All the details are new, though. The re-design started
> > > from the wish to handle resizing better, preferably without
> > > clearing the buffer queue.
> > >
> > > All the changed details are probably too much to describe here,
> > > so it is maybe better to look at this as a new proposal. It
> > > still does build on Frederic's work, and everyone who commented
> > > on it. Special thanks to Axel Davy for his counter-proposal and
> > > fighting with me on IRC. :-)
> > >
> > > Some highlights:
> > >
> > > - Accurate presentation feedback is possible also without
> > >   queueing.
> > >
> > > - You can queue also EGL-based rendering, and get presentation
> > >   feedback if you want. Also EGL can do this internally, too, as
> > >   long as EGL and the app do not try to use queueing at the same time.
> > >
> > > - More detailed presentation feedback to better allow predicting
> > >   future display refreshes.
> > >
> > > - If wl_viewport is used, neither video resolution changes nor
> > >   surface (window) size changes alone require clearing the queue.
> > >   Video can continue playing even during resizes.
> > >
> > > The protocol interfaces are arranged as
> > >
> > >       global.method(wl_surface, ...)
> > >
> > > just for brewity. We could as well do the factory approach:
> > >
> > >       o = global.get_presentation(wl_surface)
> > >       o.method(...)
> > >
> > > Or if we wanted to make it a mandatory part of the Wayland core
> > > protocol, we could just extend wl_surface itself:
> > >
> > >       wl_surface.method(...)
> > >
> > > and put the clock_id event in wl_compositor. That all is still
> > > open and fairly uninteresting, so let's concentrate on the other
> > > details.
> > >
> > > The proposal refers to wl_viewport.set_source and
> > > wl_viewport.destination requests, which do not yet exist in the
> > > scaler protocol extension. These are just the wl_viewport.set
> > > arguments split into separate src and dst requests.
> > >
> > > Here is the new proposal, some design rationale follows. Please,
> > > do ask why something is designed like it is if it puzzles you. I
> > > have a load of notes I couldn't clean up for this email. This
> > > does not even intend to completely solve all XWayland needs, but
> > > for everything native on Wayland I hope it is sufficient.
> > >
> > >
> > >               2. The protocol specification
> > >
> > > <?xml version="1.0" encoding="UTF-8"?>
> > > <protocol name="presentation_timing">
> > >
> > >   <copyright>
> > >     Copyright © 2013-2014 Collabora, Ltd.
> > >
> > >     Permission to use, copy, modify, distribute, and sell this
> > >     software and its documentation for any purpose is hereby granted
> > >     without fee, provided that the above copyright notice appear in
> > >     all copies and that both that copyright notice and this permission
> > >     notice appear in supporting documentation, and that the name of
> > >     the copyright holders not be used in advertising or publicity
> > >     pertaining to distribution of the software without specific,
> > >     written prior permission.  The copyright holders make no
> > >     representations about the suitability of this software for any
> > >     purpose.  It is provided "as is" without express or implied
> > >     warranty.
> > >
> > >     THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
> > >     SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
> > >     FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
> > >     SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
> > >     WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
> > >     AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
> > >     ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
> > >     THIS SOFTWARE.
> > >   </copyright>
> > >
> > >   <interface name="presentation" version="1">
> > >     <description summary="timed presentation related wl_surface
> requests">

Since this spec doc is a wall of text, I'll add some headings here to
make it more readable. I have these headings in my WIP branch as xml
comments. The WIP xml file is here:
http://cgit.collabora.com/git/user/pq/weston.git/tree/protocol/presentation_timing.xml?h=presentation-WIP

	Introduction

> > >       The main features of this interface are accurate presentation
> > >       timing feedback, and queued wl_surface content updates to ensure
> > >       smooth video playback while maintaining audio/video
> > >       synchronization. Some features use the concept of a presentation
> > >       clock, which is defined in presentation.clock_id event.
> > >
> > >       Requests 'feedback' and 'queue' can be regarded as additional
> > >       wl_surface methods. They are part of the double-buffered
> > >       surface state update mechanism, where other requests first set
> > >       up the state and then wl_surface.commit atomically applies the
> > >       state into use. In other words, wl_surface.commit submits a
> > >       content update.
> > >

	Surface state vs. buffer state

> > >       Interface wl_surface has requests to set surface related state
> > >       and buffer related state, because there is no separate interface
> > >       for buffer state alone. Queueing requires separating the surface
> > >       from buffer state, and buffer state can be queued while surface
> > >       state cannot.
> > >
> > >       Buffer state includes the wl_buffer from wl_surface.attach, the
> > >       state assigned by wl_surface requests frame,
> > >       set_buffer_transform and set_buffer_scale, and any
> > >       buffer-related state from extensions, for instance
> > >       wl_viewport.set_source. This state is inherent to the buffer
> > >       and the content update, rather than the surface.
> > >
> > >       Surface state includes all other state associated with
> > >       wl_surfaces, like the x,y arguments of wl_surface.attach, input
> > >       and opaque regions, damage, and extension state like
> > >       wl_viewport.destination. In general, anything expressed in
> > >       surface local coordinates is better as surface state.
> > >

	Posting: immediate and queued

> > >       The standard way of posting new content to a surface using the
> > >       wl_surface requests damage, attach, and commit is called
> > >       immediate content submission. This happens when a
> > >       presentation.queue request has not been sent since the last
> > >       wl_surface.commit.
> > >
> > >       The new way of posting a content update is a queued content
> > >       update submission. This happens on a wl_surface.commit when a
> > >       presentation.queue request has been sent since the last
> > >       wl_surface.commit.
> > >

	Queueing updates

> > >       Queued content updates do not get applied immediately in the
> > >       compositor but are pushed to a queue on receiving the
> > >       wl_surface.commit. The queue is ordered by the submission target
> > >       timestamp. Each item in the queue contains the wl_buffer, the
> > >       target timestamp, and all the buffer state as defined above. All
> > >       the queued state is taken from the pending wl_surface state at
> > >       the time of the commit, exactly like an immediate commit would
> > >       have taken it.
> > >
> > >       For instance on a queueing commit, the pending buffer is queued
> > >       and no buffer is pending afterwards. The stored values of the
> > >       x,y parameters of wl_surface.attach are reset to zero, but they
> > >       also are not queued; queued content updates do not carry the
> > >       attach offsets. All other surface state (that is not queued),
> > >       e.g. damage, is not applied nor reset.
> > >
> > >       Issuing a queueing commit without a wl_surface.attach is
> > >       undefined. However, queueing a commit with explicitly attached
> > >       NULL wl_buffer works; when and if the content update is
> > >       executed, the surface content is removed as defined for
> > >       wl_surface.attach.
> > >
> > >       If a queued content update has been submitted, and the wl_buffer
> > >       used in the update is destroyed before the wl_buffer.release
> > >       event, the results are undefined. The compositor may or may not
> > >       have executed the update, therefore the surface contents become
> > >       undefined as explained in wl_surface.attach. Whether any
> > >       presentation feedback or frame callbacks occur is undefined.
> > >

	Compositor prepares to repaint

> > >       For each surface, the compositor maintains an association to a
> > >       single output that is considered as the main output for the
> > >       surface. Queued content updates are synchronized to the
> > >       surface's main output, to provide a consistent and meaningful
> > >       definition of the moment the update is displayed to the user.
> > >       When a compositor updates an output, it processes only the
> > >       queues of the surfaces whose main output is the one being
> > >       updated. The queues of other surfaces, even if they are part of
> > >       the redrawing, are not processed at that time.
> > >
> > >       When a compositor chooses to update an output, it must predict
> > >       the presentation clock value when the display update will occur.
> > >       For the definition of the moment of display update, see
> > >       presentation_feedback.presented. Therefore if the prediction is
> > >       absolutely perfect, presentation_feedback.presented will carry
> > >       the same clock value.
> > >

	Picking an update from a queue

> > >       For each surface with queued content updates and matching main
> > >       output, the compositor picks the update with the highest
> > >       timestamp no later than a half frame period after the predicted
> > >       presentation time. The intent is to pick the content update
> > >       whose target timestamp as rounded to the output refresh period
> > >       granularity matches the same display update as the compositor is
> > >       targeting, while not displaying any content update more than a
> > >       half frame period too early. If all the updates in the queue are
> > >       already late, the highest timestamp update is taken regardless
> > >       of how late it is. Once an update in a queue has been chosen,
> > >       all remaining updates with an earlier timestamp in the queue are
> > >       discarded.
> > >

	Applying queued updates

> > >       The compositor applies the chosen update to the wl_surface,
> > >       regardless of possible wl_subsurface.set_sync mode. This allows
> > >       e.g. a video to continue running in a sub-surface also during
> > >       window resizing. It is assumed that buffer state updates do not
> > >       cause visual disruption to the window like surface state updates
> > >       can. Support for wl_viewport is needed for glitch-free resizing
> > >       if the resizing involves changing the (sub-)surface size.
> > >
> > >       When the chosen update is applied, the associated frame
> > >       callbacks are sent. Damage for the whole surface is assumed,
> > >       as damage is not explicitly queued with buffer state.
> > >

	Completeting presentation

> > >       When the final realized presentation time is available, e.g.
> > >       after a framebuffer flip completes, the requested
> > >       presentation_feedback.presented events are sent. The final
> > >       presentation time can differ from the compositor's predicted
> > >       display update time and the update's target time, especially
> > >       when the compositor misses its target vertical blanking period.
> > >

	Discarding updates

> > >       When updates from the queue are discarded, the
> > >       presentation_feedback.discarded event is delivered if feedback
> > >       was requested. Also the associated frame callbacks are sent.
> > >
> > >       An immediate content update with an attach request automatically
> > >       discards the whole queue just before the update gets applied. If
> > >       wl_surface.attach has not been sent for an immediate content
> > >       submission, the queue is not discarded, and the content update
> > >       applies only the surface state, but no buffer state.
> >
> > If you read the above paragraph carefully, you see that the last
> > sentence CHANGES EXISTING WAYLAND CORE PROTOCOL BEHAVIOUR.
> >
> > The change is very subtle. It means, that without a wl_surface.attach,
> > the buffer state is no longer applied on commit at all! To recap, the
> > buffer state is:
> > - frame callbacks (!)
> > - set_buffer_transform
> > - set_buffer_scale
> > - the src_* arguments of wl_viewport.set
> >
> > The reason is explained in my recent email:
> >
> http://lists.freedesktop.org/archives/wayland-devel/2014-February/013293.html
> >
> > An immediate commit without an attach should not apply any buffer
> > state, because previous queueing of frames may have left buffer state
> > that is incorrect for the currently showing buffer. Immediate commits
> > without attach are used to update surface (and shell!) state, and
> > applying incorrect buffer state could cause a visual glitch.
> >
> > We could claim, that this change in the core protocol exists only if
> > the presentation extension is advertised by the server, but that would
> > cause a lot more work to fix clients that get bit by this change, rather
> > than fix the clients to always attach a wl_buffer when they want to
> > change buffer state, even if it is the same buffer they just attached
> > and committed already.
> >
> > Therefore I would like to bring the concepts of surface state and
> > buffer state to the core protocol, and have the core procotol define
> > that buffer state is applied only if there is an attach.
> >
> > In the past, we already changed the wl_surface.attach semantics to not
> > re-attach the "current" buffer again, when there is a wl_surface.commit.
> > The practical consequence of that was that a commit without an attach
> > cannot cause any wl_buffer on this surface to become reserved and
> > readable by the server, and hence no (new) wl.buffer.release would be
> > posted either.
> >
> > That means that clients already need to re-attach the same wl_buffer
> > again, if they changed the buffer contents and want to show the new
> > image. I think this should mitigate the impact of the core protocol
> > change.
> >
> > I guess the only interesting case is the frame callback, and whether
> > anyone (ab)uses it without an attach.
> 
> Someone does abuse it right now: Weston.  Inside the Weston Wayland
> backend, every time we want a redraw we frame+commit,  wait for the frame
> callback, and then repaint.  The actual reason for this is to maintain the
> frame callbacks inside of the nested weston, so that can be changed.  Also,
> if we change weston to take a more timing-based approach to repaint
> scheduling (as you have mentioned in the past) that would also somewhat
> mitigate this problem.

Aha, cascading, I didn't realize that before.

> One of the reasons why this is specifically a problem is because it means
> that EGL clients can't update any surface state without either a full
> repaint or a new EGL extension.  The obvious answer is "just re-send the
> old wl_buffer".  However, EGL clients don't have acess to it and can't
> re-send.  They can only repaing and have EGL send a new buffer.  Therefore,
> if we want to allow for updating surface state without repainting, we need
> a way to say "just use the previous one"

I suspect a slight misunderstanding here. You can always change surface
state with an immediate commit, even without an attach. Changing buffer
state would require an attach. As buffer state is intimately tied to
the buffer contents, I can't see a reason to be able to change buffer
state without redrawing it.

> Another thought is what Axel Davy brought up on IRC this morning, that of
> swapInterval > 1.  However, here it isn't a problem because eglSwapInterval
> is handled by the EGL implementation and it does have access to the
> previously submitted wl_buffer so it can resubmit.

I'm not sure resubmitting from inside EGL is appropriate... it depends
on when you do it, did the application have a chance to push new
pending state to the surface. An app certainly does not expect e.g.
glClear to cause a wl_surface.commit.

Using queueing for swapinterval > 1 would be more reliable, IMO. If an
EGL implementation decides to support swapinterval > 1 in the first
place.

> Also (again, Axel pointed this out) there is a race here between the
> wl_surface.attach and the wl_buffer.release.  However, if we specify that
> wl_buffer.release gets called once for each wl_surface.attach this can be
> handled by simple client-side reference-counting.  Still kind of a pain
> though.

Yeah, this problem is also not invented with queueing. You can create
the same problem without the presentation extension, too. If a client
attaches a wl_buffer first to one wl_surface, then to another
wl_surface, and then receives wl_buffer.release, it is ambiguous
whether the release is a "reply" to only the first attach, or both. The
server might repaint in between the attaches and send the release.

I suspect we should start a new thread about this.

Once for each attach (attach+commit)... I like that idea and see it is
necessary, but it's a behavioral change to the existing core protocol, I
believe.

> Also on IRC, you brought up the idea of changing the frame callback to be
> more of a "you should redraw now" and less of a "your buffer just got put
> onscreen".  Obviously, this breaks anything (such as Weston's
> wayland-backend.so) that uses the frame callback as anything other than a
> throttling hint.  My feeling is that changing the frame callback to be more
> of a "you should re-draw now" is probably ok as long as we do two things:
> a) Figure out how many clients actually abuse it and how much damage
> changing it will cause.  b) guarantee that any buffer committed after the
> frame callback has been sent will be presented strictly later than the one
> associated with the frame callback.  Without this basic guarantee, it would
> again become useless.

The very first time I ever saw wl_surface.frame, I immediately
misunderstood its purpose. I thought it was sent when the update hit
the screen, not when the compositor is _starting_ to repaint with the
update. Since then I have added wording to the core protocol spec I no
longer consider right: that you can get frame callback guaranteed
triggered also without any content or state updates. Does the frame
callback mean "the compositor is repainting" or "the content update is
being processed for screen now" / "your last frame is being used, now
is a good time to prepare the next one". We suggest to avoid frame
callbacks when the surface is not visible, which breaks "the compositor
is repainting" interpretation. I have a history of misunderstanding the
frame callback's purpose.

What is the frame callback supposed to mean? Don't think about how it
is implemented in Weston, but what it is supposed to convey? What
implications are deliberate and what are just side-effects of the
current implementation in Weston?

Anyway, the current "convenient" wording in the core spec ties up the
immediate commit behaviour.

Back to the problem at hand, how should the frame request be handled in
the various cases? Let me try to clarify the different cases where it
occurs.

commit
- immediate commit without an attach
- apply surface state, but not buffer state
- current spec says frame callback is triggered by the next compositor
  repaint; frame callback applied

attach+commit
- immediate commit with an attach
- apply surface and buffer state
- frame callback triggered by the next compositor repaint, because that
  is when the new surface content is applied, too; frame callback
  applied

queue+commit
- queued commit without an attach
- presentation extension specifies this case as undefined behaviour
- should we have some defined behaviour instead?

attach+queue+commit
- queued commit with an attach
- queues buffer state, does not modify current surface or buffer state
- resets dx,dy from attach to zero?
- creates a queued update in the per-surface queue
- Does frame callback get queued, left as pending, or applied?

The core protocol specification explicitly requires the 'commit' case
as described above, so I cannot say that frame callbacks are buffer
state. That needs fixing my proposal.

The remaining question is, what to do in the 'attach+queue+commit'
case. I think applying would make the least sense, because a queueing
commit is not expected to change the current state immediately.

If the frame callback is left as pending, there is no easy alternative
mechanism to achieve the equivalent of queueing it. But could there
even be a need for such?

If the frame callback is queued, it will get triggered when the queued
update is applied, i.e. in the same fashion when an immediate update is
applied. If a client wants the immediate frame callback, it can do a
frame+commit after the queued commit.

What are the consequences of choosing either behaviour?

If eglSwapBuffers for whatever reason wants to have the immediate frame
callback while we specify it can be queued, EGL will always need to
commit twice. However, I cannot see why it would want that. Doing N
times frame+commit to implement swapinterval N is bad: the application
may push new pending state and does not expect random commits to occur
behind its back. Unless EGL stalls inside eglSwapBuffers call until it
has sent the last commit it needs, which would probably be bad for
performance.

But if frame callbacks can be queued, and EGL uses queueing to
implement swapinterval > 1, then EGL could simply use the frame
callback like it uses it for swapinterval=1. (Is EGL allowed to use
queueing? Should applications expect explicit queueing to work with
EGL?)

My opinion at the moment is that frame callbacks should be queueable,
but they are not buffer state because of the immediate commit without
attach case. With this, the changes to the core protocol would be
limited to explicitly disabling the change of e.g. buffer_scale without
attaching a wl_buffer.

Thanks,
pq