[RFC v2] Wayland presentation extension (video protocol)

Fri Jan 31 05:29:46 PST 2014

On Thu, 30 Jan 2014 17:35:17 +0200
Pekka Paalanen <ppaalanen at gmail.com> wrote:

> The v1 proposal is here:
> http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html
> 
> In v2 the basic idea is the same: you can queue frames with a
> target presentation time, and you can get accurate presentation
> feedback. All the details are new, though. The re-design started
> from the wish to handle resizing better, preferably without
> clearing the buffer queue.

...

>   <interface name="presentation_feedback" version="1">
>     <description summary="presentation time feedback event">
>       A presentation_feedback object returns the feedback information
>       about a wl_surface content update becoming visible to the user.
>       One object corresponds to one content update submission
>       (wl_surface.commit), queued or immediate. There are two possible
>       outcomes: the content update may be presented to the user, in
>       which case the presentation timestamp is delivered. Otherwise,
>       the content update is discarded, and the user never had a chance
>       to see it before it was superseded or the surface was destroyed.
> 
>       Once a presentation_feedback object has delivered an event, it
>       becomes inert, and should be destroyed by the client.
>     </description>
> 
>     <request name="destroy" type="destructor">
>       <description summary="destroy presentation feedback object">
>         The object is destroyed. If a feedback event had not been
>         delivered yet, it is cancelled.
>       </description>
>     </request>
> 
>     <event name="sync_output">
>       <description summary="presentation synchronized to this output">
>         As presentation can be synchronized to only one output at a
>         time, this event tells which output it was. This event is only
>         sent prior to the presented event.
> 
>         As clients may bind to the same global wl_output multiple
>         times, this event is sent for each bound instance that matches
>         the synchronized output. If a client has not bound to the
>         right wl_output global at all, this event is not sent.
>       </description>
> 
>       <arg name="output" type="object" interface="wl_output"
>            summary="presentation output"/>
>     </event>
> 
>     <event name="presented">
>       <description summary="the content update was displayed">
>         The associated content update was displayed to the user at the
>         indicated time (tv_sec, tv_nsec). For the interpretation of
> the timestamp, see presentation.clock_id event.
> 
>         The timestamp corresponds to the time when the content update
>         turned into light the first time on the surface's main output.
>         Compositors may approximate this from the framebuffer flip
>         completion events from the system, and the latency of the
>         physical display path if known.
> 
>         This event is preceeded by all related sync_output events
>         telling which output's refresh cycle the feedback corresponds
>         to, i.e. the main output for the surface. Compositors are
>         recommended to choose to the output containing the largest
>         part of the wl_surface, or keeping the output they previously
>         chose. Having a stable presentation output association helps
>         clients to predict future output refreshes (vblank).
> 
>         Argument 'refresh' gives the compositor's prediction of how
>         many nanoseconds after tv_sec, tv_nsec the very next output
>         refresh may occur. This is to further aid clients in
>         predicting future refreshes, i.e., estimating the timestamps
>         targeting the next few vblanks. If such prediction cannot
>         usefully be done, the argument is zero.
> 
>         The 64-bit value combined from seq_hi and seq_lo is the value
>         of the output's vertical retrace counter when the content
>         update was first scanned out to the display. This value must
>         be compatible with the definition of MSC in
>         GLX_OML_sync_control specification. Note, that if the display
>         path has a non-zero latency, the time instant specified by
>         this counter may differ from the timestamp's.
> 
>         If the output does not have a constant refresh rate, explicit
>         video mode switches excluded, then the refresh argument must
>         be zero.
> 
>         If the output does not have a concept of vertical retrace or a
>         refresh cycle, or the output device is self-refreshing without
>         a way to query the refresh count, then the arguments seq_hi
>         and seq_lo must be zero.
>       </description>
> 
>       <arg name="tv_sec" type="uint"
>            summary="seconds part of the presentation timestamp"/>
>       <arg name="tv_nsec" type="uint"
>            summary="nanoseconds part of the presentation timestamp"/>
>       <arg name="refresh" type="uint" summary="nanoseconds till next
> refresh"/> <arg name="seq_hi" type="uint"
>            summary="high 32 bits of refresh counter"/>
>       <arg name="seq_lo" type="uint"
>            summary="low 32 bits of refresh counter"/>
>     </event>
> 
>     <event name="discarded">
>       <description summary="the content update was not displayed">
>         The content update was never displayed to the user.
>       </description>
>     </event>
>   </interface>
> 
> </protocol>

No-one else asked anything yet, so let me. ;-)

When a client starts an animation or video playback from scratch, that
is, after a long pause or the first time on the wl_surface, it has to
present at least one frame before it can get any feedback. Because you
want to start playback ASAP and keep it streaming, you have to post
another frame too, before you have time to receive the feedback, and
use it to adjust the submission target times. IOW, the first few frames
have to be scheduled blind, the only information you have is the *set*
of outputs where the wl_surface is on, and the outputs' default refresh
rates.

Could we somehow calibrate the client's scheduling loop earlier?

We might add another request to presentation interface:

    <request name="preroll_feedback">
      <arg name="surface" type="object" interface="wl_surface"
           summary="target surface"/>
      <arg name="callback" type="new_id" interface="presentation_feedback"
           summary="new feedback object"/>
    </request>

Yeah, this is a second request that creates a presentation_feedback
object, but since both factory requests are in the same interface,
there should not be versioning issues.

Preroll_feedback would create a new presentation_feedback object, which
the compositor would trigger immediately, providing the timestamp and
MSC of the latest known display refresh (equivalent to vblank), the
surface's current main output, and the output's current refresh period
length. If the compositor cannot implement this, it could just reply
with presentation_feedback.discarded.

Obviously using this request requires a roundtrip to get the reply from
the compositor, but having the compositor send a continuous stream of
"vblank-events" would be much much worse.

Once a client gets the reply for preroll_feedback, it then reads the
current UST with clock_gettime() and decides which coming display
refresh it will target. The start should be more accurately in sync
with the display refresh cycle than otherwise.

However, there are more downsides than just the roundtrip. Even if the
compositor replies immediately, you probably end up starting the
playback a frame later than if you just started blind. If the
compositor cannot access the requested information directly, e.g. it
has not flipped the framebuffer for some time and the drivers don't
support the query, the compositor might need to wait for the next
vblank before it can reply. Or the compositor might reply with so old
data that it is not useful anymore. Any delays would void the benefit
of achieving early sync.

A more fundamental problem is, that the wl_surface must be mapped
before this request makes sense. If a surface is not mapped, it does
not have size or position, and the compositor does not know which
output it would be syncing to. Hence it cannot reply.

Considering all this, would it be worth it to have this kind of request?

FWIW, I asked on #dri-devel, and it seems that DRM/KMS should be able
to provide the needed info without waiting for the next vblank.
However, it probably depends on the particular driver in use, does the
driver keep vblank irqs running, and does the hardware have convenient
registers to read the necessary information. To me it sounds a lot more
uncertain than just posting a real frame and seeing when it hits the
user.

OTOH, GLX_OML_sync_control has glXGetSyncValuesOML() and
glXGetMscRateOML() which provide similar information, so this should be
possible at some level.

Thanks,
pq