<div dir="ltr"> <div class="gmail_extra"> <div class="gmail_quote">On Mon, Feb 10, 2014 at 3:53 AM, Pekka Paalanen <<a href="mailto:ppaalanen@gmail.com" target="_blank">ppaalanen@gmail.com</a>> wrote: <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On Sat, 8 Feb 2014 15:23:29 -0600 Jason Ekstrand <<a href="mailto:jason@jlekstrand.net">jason@jlekstrand.net</a>> wrote: > Pekka, > First off, I think you've done a great job over-all. I think it will both > cover most cases and work well I've got a few comments below. </div>Thank you for the review. :-) Replies below. <div><div class="h5"> > On Thu, Jan 30, 2014 at 9:35 AM, Pekka Paalanen <<a href="mailto:ppaalanen@gmail.com">ppaalanen@gmail.com</a>> wrote: > > > Hi, > > > > it's time for a take two on the Wayland presentation extension. > > > > > > 1. Introduction > > > > The v1 proposal is here: > > > > <a href="http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html" target="_blank">http://lists.freedesktop.org/archives/wayland-devel/2013-October/011496.html</a> > > > > In v2 the basic idea is the same: you can queue frames with a > > target presentation time, and you can get accurate presentation > > feedback. All the details are new, though. The re-design started > > from the wish to handle resizing better, preferably without > > clearing the buffer queue. > > > > All the changed details are probably too much to describe here, > > so it is maybe better to look at this as a new proposal. It > > still does build on Frederic's work, and everyone who commented > > on it. Special thanks to Axel Davy for his counter-proposal and > > fighting with me on IRC. :-) > > > > Some highlights: > > > > - Accurate presentation feedback is possible also without > > queueing. > > > > - You can queue also EGL-based rendering, and get presentation > > feedback if you want. Also EGL can do this internally, too, as > > long as EGL and the app do not try to use queueing at the same time. > > > > - More detailed presentation feedback to better allow predicting > > future display refreshes. > > > > - If wl_viewport is used, neither video resolution changes nor > > surface (window) size changes alone require clearing the queue. > > Video can continue playing even during resizes. </div></div>... <div><div class="h5">> > <interface name="presentation" version="1"> > > <description summary="timed presentation related wl_surface requests"> > > The main features of this interface are accurate presentation > > timing feedback, and queued wl_surface content updates to ensure > > smooth video playback while maintaining audio/video > > synchronization. Some features use the concept of a presentation > > clock, which is defined in presentation.clock_id event. > > > > Requests 'feedback' and 'queue' can be regarded as additional > > wl_surface methods. They are part of the double-buffered > > surface state update mechanism, where other requests first set > > up the state and then wl_surface.commit atomically applies the > > state into use. In other words, wl_surface.commit submits a > > content update. > > > > Interface wl_surface has requests to set surface related state > > and buffer related state, because there is no separate interface > > for buffer state alone. Queueing requires separating the surface > > from buffer state, and buffer state can be queued while surface > > state cannot. > > > > Buffer state includes the wl_buffer from wl_surface.attach, the > > state assigned by wl_surface requests frame, > > set_buffer_transform and set_buffer_scale, and any > > buffer-related state from extensions, for instance > > wl_viewport.set_source. This state is inherent to the buffer > > and the content update, rather than the surface. > > > > Surface state includes all other state associated with > > wl_surfaces, like the x,y arguments of wl_surface.attach, input > > and opaque regions, damage, and extension state like > > wl_viewport.destination. In general, anything expressed in > > surface local coordinates is better as surface state. > > > > The standard way of posting new content to a surface using the > > wl_surface requests damage, attach, and commit is called > > immediate content submission. This happens when a > > presentation.queue request has not been sent since the last > > wl_surface.commit. > > > > The new way of posting a content update is a queued content > > update submission. This happens on a wl_surface.commit when a > > presentation.queue request has been sent since the last > > wl_surface.commit. > > > > Queued content updates do not get applied immediately in the > > compositor but are pushed to a queue on receiving the > > wl_surface.commit. The queue is ordered by the submission target > > timestamp. Each item in the queue contains the wl_buffer, the > > target timestamp, and all the buffer state as defined above. All > > the queued state is taken from the pending wl_surface state at > > the time of the commit, exactly like an immediate commit would > > have taken it. > > > > For instance on a queueing commit, the pending buffer is queued > > and no buffer is pending afterwards. The stored values of the > > x,y parameters of wl_surface.attach are reset to zero, but they > > also are not queued; queued content updates do not carry the > > attach offsets. All other surface state (that is not queued), > > e.g. damage, is not applied nor reset. > > > > Issuing a queueing commit without a wl_surface.attach is > > undefined. However, queueing a commit with explicitly attached > > NULL wl_buffer works; when and if the content update is > > executed, the surface content is removed as defined for > > wl_surface.attach. > > > > If a queued content update has been submitted, and the wl_buffer > > used in the update is destroyed before the wl_buffer.release > > event, the results are undefined. The compositor may or may not > > have executed the update, therefore the surface contents become > > undefined as explained in wl_surface.attach. Whether any > > presentation feedback or frame callbacks occur is undefined. > > > > For each surface, the compositor maintains an association to a > > single output that is considered as the main output for the > > surface. Queued content updates are synchronized to the > > surface's main output, to provide a consistent and meaningful > > definition of the moment the update is displayed to the user. > > When a compositor updates an output, it processes only the > > queues of the surfaces whose main output is the one being > > updated. The queues of other surfaces, even if they are part of > > the redrawing, are not processed at that time. > > > > When a compositor chooses to update an output, it must predict > > the presentation clock value when the display update will occur. > > For the definition of the moment of display update, see > > presentation_feedback.presented. Therefore if the prediction is > > absolutely perfect, presentation_feedback.presented will carry > > the same clock value. > > > > For each surface with queued content updates and matching main > > output, the compositor picks the update with the highest > > timestamp no later than a half frame period after the predicted > > presentation time. The intent is to pick the content update > > whose target timestamp as rounded to the output refresh period > > granularity matches the same display update as the compositor is > > targeting, while not displaying any content update more than a > > > > I'm not really following 100% here. It's not your fault, this is just a > terribly awkward sort of thing to try and put into English. It sounds to > me like the following: If P0 is the time of the next present and P1 is the > time of the one after that, you look for the largest thing less than the > average of P1 and P2. Is this correct? Why go for the average? The > client is going to have to adjust anyway. > > > > half frame period too early. If all the updates in the queue are > > already late, the highest timestamp update is taken regardless > > of how late it is. Once an update in a queue has been chosen, > > all remaining updates with an earlier timestamp in the queue are > > discarded. > > > > Ok, I think what you are saying works. Again, it's difficult to parse but > these things always are. > </div></div>Yes, it is hard to write a generic algorithm in English. Axel did a nice job clarifying it. I hope I can improve on the language after I have actually implemented this and any possible changes we need to this. Also, the inline documentation in the XML file is getting a bit out of hand, lacking in expressional power. I would have liked to use sub-headings, the algorithm could use pseudo-code, etc, but they just don't really exist here. Yet, I want these things to be part of the protocol spec, so the semantics of the protocol get properly defined. <div class=""> > > 4.5. The frame callback and swap interval > > > > The frame callback needs to be with the buffer state, so it gets > > queued. If a client makes e.g. EGL's commits queued, EGL may > > still rely on frame callbacks for blocking apps properly, and > > that is related to presenting the buffer, not just the very next > > output refresh. EGL may also internally use queueing and > > feedback to implement swap interval > 1. > > > > Doesn't this mean that you need eglSwapInterval(0) if you're queueing? > This is probably the case anyway, but it might be worth noting explicitly. > I think what you're doing with frame callbacks is sane, but I'm not sure. </div>Yeah, swapinterval zero is needed indeed. Personally I would be more worried about whether an EGL implementation agrees to allocate new buffers if the app is queueing in advance. I suspect queueing many frames in advance won't work with EGL in practice. But you can still queue a frame at a time, that might be enough for e.g. GL-based video players under good conditions. That might not need swapinterval zero, either. <div class=""> > My one latent concern is that I still don't think we're entirely handling > the case that QtQuick wants. What they want is to do their rendering a few > frames in advance in case of CPU/GPU jitter. Technically, this extension > handles this by the client simply doing a good job of guessing presentation > times on a one-per-frame baseis. However, it doesn't allow for any damage > tracking. In the case of QtQuick they want a linear queue of buffers where > no buffer ever gets skipped. In this case, you could do damage tracking by > allowing it to accumulate from one frame to another and you get all of the > damage-tracking advantages that you had before. I'm not sure how much this > matters, but it might be worth thinking about it. </div>Does it really want to display *every* frame regardless of time? It doesn't matter that if a deadline is missed, the animation slows down rather than jumps to keep up with intended velocity? </blockquote><div> </div><div>That is my understanding of how it works now. I *think* they figure the compositor isn't the bottle-kneck and that it will git its 60 FPS. That said, I don't actually work on QtQuick. I'm just trying to make sure they don't get completely left out in the cold. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Axel has a good point, cannot this be just done client side and immediate updates based on frame callbacks? </blockquote><div> </div><div>Probably not. They're using GLES and EGL so they can't draw early and just stash the buffer. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> If there is a problem in using frame callbacks for that, that is more likely a problem in the compositor's frame scheduling than the protocol. The problem with damage tracking, why I did not take damage as queued state, is that it is given in surface coordinates. This will become a problem during resizes, where the surface size changes, and wl_viewport is used to decouple the content from the surface space. </blockquote><div> </div><div>The separation makes sense. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> If we queue damage, we basically need to queue also surface resizes. Without wl_viewport this is what happens automatically, as surface size is taken from the buffer size. However, in the proposed design, the purpose of wl_viewport is to decouple the surface size from buffer size, so that they can change independently. The use case is live video: if you resize the window, you don't want to redo the video frames, because that would likely cause a glitch. Also if the video resolution changes on the fly, by e.g. stream quality control, you don't need to do anything extra to keep the window at the old size. Damage is a property of the content update, yes, but we have it defined in surface coordinates, so when surface and buffer sizes change asynchronously, the damage region would be incorrect. The downside is indeed that we lose damage information for queued buffers. This is a deliberate design choice, since the extension was designed primarily for video where usually the whole surface gets damaged. </blockquote><div> Yeah, I think you made the right call on this one. Queueing buffers in a completely serial fassion really does seem to be a special case. Trying to do damage tracking for an arbitrary queue would very quickly get insane. Plus all the other problems you mentioned. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> But, I guess we could add another request, presentation.damage, to give the damage region in buffer coordinates. Would it be worth it? Other solutions do not come into my mind at the moment. Thanks, pq </blockquote></div> </div><div class="gmail_extra">It's looking good to me. --Jason Ekstrand </div></div>