[RFC v2] Wayland presentation extension (video protocol)

Mon Feb 10 01:53:39 PST 2014

On Sat, 8 Feb 2014 15:23:29 -0600
Jason Ekstrand <jason at jlekstrand.net> wrote:

> Pekka,
> First off, I think you've done a great job over-all.  I think it will both
> cover most cases and work well  I've got a few comments below.

Thank you for the review. :-)
Replies below.

> On Thu, Jan 30, 2014 at 9:35 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> 
> > Hi,
> >
> > it's time for a take two on the Wayland presentation extension.
> >
> >
> >                 1. Introduction
> >
> > The v1 proposal is here:
> >
> > http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html
> >
> > In v2 the basic idea is the same: you can queue frames with a
> > target presentation time, and you can get accurate presentation
> > feedback. All the details are new, though. The re-design started
> > from the wish to handle resizing better, preferably without
> > clearing the buffer queue.
> >
> > All the changed details are probably too much to describe here,
> > so it is maybe better to look at this as a new proposal. It
> > still does build on Frederic's work, and everyone who commented
> > on it. Special thanks to Axel Davy for his counter-proposal and
> > fighting with me on IRC. :-)
> >
> > Some highlights:
> >
> > - Accurate presentation feedback is possible also without
> >   queueing.
> >
> > - You can queue also EGL-based rendering, and get presentation
> >   feedback if you want. Also EGL can do this internally, too, as
> >   long as EGL and the app do not try to use queueing at the same time.
> >
> > - More detailed presentation feedback to better allow predicting
> >   future display refreshes.
> >
> > - If wl_viewport is used, neither video resolution changes nor
> >   surface (window) size changes alone require clearing the queue.
> >   Video can continue playing even during resizes.
...
> >   <interface name="presentation" version="1">
> >     <description summary="timed presentation related wl_surface requests">
> >       The main features of this interface are accurate presentation
> >       timing feedback, and queued wl_surface content updates to ensure
> >       smooth video playback while maintaining audio/video
> >       synchronization. Some features use the concept of a presentation
> >       clock, which is defined in presentation.clock_id event.
> >
> >       Requests 'feedback' and 'queue' can be regarded as additional
> >       wl_surface methods. They are part of the double-buffered
> >       surface state update mechanism, where other requests first set
> >       up the state and then wl_surface.commit atomically applies the
> >       state into use. In other words, wl_surface.commit submits a
> >       content update.
> >
> >       Interface wl_surface has requests to set surface related state
> >       and buffer related state, because there is no separate interface
> >       for buffer state alone. Queueing requires separating the surface
> >       from buffer state, and buffer state can be queued while surface
> >       state cannot.
> >
> >       Buffer state includes the wl_buffer from wl_surface.attach, the
> >       state assigned by wl_surface requests frame,
> >       set_buffer_transform and set_buffer_scale, and any
> >       buffer-related state from extensions, for instance
> >       wl_viewport.set_source. This state is inherent to the buffer
> >       and the content update, rather than the surface.
> >
> >       Surface state includes all other state associated with
> >       wl_surfaces, like the x,y arguments of wl_surface.attach, input
> >       and opaque regions, damage, and extension state like
> >       wl_viewport.destination. In general, anything expressed in
> >       surface local coordinates is better as surface state.
> >
> >       The standard way of posting new content to a surface using the
> >       wl_surface requests damage, attach, and commit is called
> >       immediate content submission. This happens when a
> >       presentation.queue request has not been sent since the last
> >       wl_surface.commit.
> >
> >       The new way of posting a content update is a queued content
> >       update submission. This happens on a wl_surface.commit when a
> >       presentation.queue request has been sent since the last
> >       wl_surface.commit.
> >
> >       Queued content updates do not get applied immediately in the
> >       compositor but are pushed to a queue on receiving the
> >       wl_surface.commit. The queue is ordered by the submission target
> >       timestamp. Each item in the queue contains the wl_buffer, the
> >       target timestamp, and all the buffer state as defined above. All
> >       the queued state is taken from the pending wl_surface state at
> >       the time of the commit, exactly like an immediate commit would
> >       have taken it.
> >
> >       For instance on a queueing commit, the pending buffer is queued
> >       and no buffer is pending afterwards. The stored values of the
> >       x,y parameters of wl_surface.attach are reset to zero, but they
> >       also are not queued; queued content updates do not carry the
> >       attach offsets. All other surface state (that is not queued),
> >       e.g. damage, is not applied nor reset.
> >
> >       Issuing a queueing commit without a wl_surface.attach is
> >       undefined. However, queueing a commit with explicitly attached
> >       NULL wl_buffer works; when and if the content update is
> >       executed, the surface content is removed as defined for
> >       wl_surface.attach.
> >
> >       If a queued content update has been submitted, and the wl_buffer
> >       used in the update is destroyed before the wl_buffer.release
> >       event, the results are undefined. The compositor may or may not
> >       have executed the update, therefore the surface contents become
> >       undefined as explained in wl_surface.attach. Whether any
> >       presentation feedback or frame callbacks occur is undefined.
> >
> >       For each surface, the compositor maintains an association to a
> >       single output that is considered as the main output for the
> >       surface. Queued content updates are synchronized to the
> >       surface's main output, to provide a consistent and meaningful
> >       definition of the moment the update is displayed to the user.
> >       When a compositor updates an output, it processes only the
> >       queues of the surfaces whose main output is the one being
> >       updated. The queues of other surfaces, even if they are part of
> >       the redrawing, are not processed at that time.
> >
> >       When a compositor chooses to update an output, it must predict
> >       the presentation clock value when the display update will occur.
> >       For the definition of the moment of display update, see
> >       presentation_feedback.presented. Therefore if the prediction is
> >       absolutely perfect, presentation_feedback.presented will carry
> >       the same clock value.
> >
> >       For each surface with queued content updates and matching main
> >       output, the compositor picks the update with the highest
> >       timestamp no later than a half frame period after the predicted
> >       presentation time. The intent is to pick the content update
> >       whose target timestamp as rounded to the output refresh period
> >       granularity matches the same display update as the compositor is
> >       targeting, while not displaying any content update more than a
> >
> 
> I'm not really following 100% here. It's not your fault, this is just a
> terribly awkward sort of thing to try and put into English.  It sounds to
> me like the following: If P0 is the time of the next present and P1 is the
> time of the one after that, you look for the largest thing less than the
> average of P1 and P2.  Is this correct?  Why go for the average?  The
> client is going to have to adjust anyway.
> 
> 
> >       half frame period too early. If all the updates in the queue are
> >       already late, the highest timestamp update is taken regardless
> >       of how late it is. Once an update in a queue has been chosen,
> >       all remaining updates with an earlier timestamp in the queue are
> >       discarded.
> >
> 
> Ok, I think what you are saying works.  Again, it's difficult to parse but
> these things always are.
> 

Yes, it is hard to write a generic algorithm in English. Axel did a
nice job clarifying it. I hope I can improve on the language after I
have actually implemented this and any possible changes we need to this.

Also, the inline documentation in the XML file is getting a bit out of
hand, lacking in expressional power. I would have liked to use
sub-headings, the algorithm could use pseudo-code, etc, but they just
don't really exist here. Yet, I want these things to be part of the
protocol spec, so the semantics of the protocol get properly defined.

> >         4.5. The frame callback and swap interval
> >
> > The frame callback needs to be with the buffer state, so it gets
> > queued. If a client makes e.g. EGL's commits queued, EGL may
> > still rely on frame callbacks for blocking apps properly, and
> > that is related to presenting the buffer, not just the very next
> > output refresh. EGL may also internally use queueing and
> > feedback to implement swap interval > 1.
> >
> 
> Doesn't this mean that you need eglSwapInterval(0) if you're queueing?
> This is probably the case anyway, but it might be worth noting explicitly.
> I think what you're doing with frame callbacks is sane, but I'm not sure.

Yeah, swapinterval zero is needed indeed. Personally I would be more
worried about whether an EGL implementation agrees to allocate new
buffers if the app is queueing in advance. I suspect queueing many
frames in advance won't work with EGL in practice.

But you can still queue a frame at a time, that might be enough for
e.g. GL-based video players under good conditions. That might not need
swapinterval zero, either.

> My one latent concern is that I still don't think we're entirely handling
> the case that QtQuick wants.  What they want is to do their rendering a few
> frames in advance in case of CPU/GPU jitter.  Technically, this extension
> handles this by the client simply doing a good job of guessing presentation
> times on a one-per-frame baseis.  However, it doesn't allow for any damage
> tracking.  In the case of QtQuick they want a linear queue of buffers where
> no buffer ever gets skipped.  In this case, you could do damage tracking by
> allowing it to accumulate from one frame to another and you get all of the
> damage-tracking advantages that you had before.  I'm not sure how much this
> matters, but it might be worth thinking about it.

Does it really want to display *every* frame regardless of time? It
doesn't matter that if a deadline is missed, the animation slows down
rather than jumps to keep up with intended velocity?

Axel has a good point, cannot this be just done client side and
immediate updates based on frame callbacks?

If there is a problem in using frame callbacks for that, that is more
likely a problem in the compositor's frame scheduling than the protocol.

The problem with damage tracking, why I did not take damage as queued
state, is that it is given in surface coordinates. This will become a
problem during resizes, where the surface size changes, and wl_viewport
is used to decouple the content from the surface space.

If we queue damage, we basically need to queue also surface resizes.
Without wl_viewport this is what happens automatically, as surface size
is taken from the buffer size.

However, in the proposed design, the purpose of wl_viewport is to
decouple the surface size from buffer size, so that they can change
independently. The use case is live video: if you resize the window,
you don't want to redo the video frames, because that would likely
cause a glitch. Also if the video resolution changes on the fly, by e.g.
stream quality control, you don't need to do anything extra to keep the
window at the old size. Damage is a property of the content update,
yes, but we have it defined in surface coordinates, so when surface and
buffer sizes change asynchronously, the damage region would be
incorrect.

The downside is indeed that we lose damage information for queued
buffers. This is a deliberate design choice, since the extension was
designed primarily for video where usually the whole surface gets
damaged.

But, I guess we could add another request, presentation.damage, to give
the damage region in buffer coordinates. Would it be worth it?

Other solutions do not come into my mind at the moment.

Thanks,
pq