[RFC 0/4] New wayland-egl functions for swapping

Tue Mar 19 12:13:20 PDT 2013

wl_egl_window_take_buffer might also facilitate sharing wl_buffers
between processes.

On Mon, Mar 4, 2013 at 1:51 PM, John Kåre Alsaker
<john.kare.alsaker at gmail.com> wrote:
> On Mon, Mar 4, 2013 at 11:56 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
>> On Mon, 4 Mar 2013 11:12:23 +0100
>> John Kåre Alsaker <john.kare.alsaker at gmail.com> wrote:
>>
>>> On Mon, Mar 4, 2013 at 9:48 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
>>> > On Sun,  3 Mar 2013 02:26:10 +0100
>>> > John Kåre Alsaker <john.kare.alsaker at gmail.com> wrote:
>>> >
>>> >> This patchset introduces wl_egl_window_take_buffer and wl_egl_window_commit to the native wayland-egl platform.
>>> >>
>>> >> wl_egl_window_take_buffer gives a way to get a wl_buffer from an wl_egl_window.
>>> >> The application is expected to attach this to one or multiple wl_surfaces and the EGL implementation
>>> >> will listen to it's release event before it reuses the buffer.
>>> >>
>>> >> This has a couple of advantages over eglSwapBuffers:
>>> >>  - It's always non-blocking, clients doesn't have to ensure swap interval is 0
>>> >
>>> > Where do you synchronize the GPU?
>>> > Does wl_egl_window_take_buffer imply a glFinish(), with a real wait for
>>> > GPU to finish rendering?
>>> >
>>> > An advantage of eglSwapBuffers, if I understand right, is that it can
>>> > schedule a "swap", and return early. The actual wait for GPU to finish
>>> > rendering happens in the compositor, when it tries to use the buffer
>>> > provided by a client as a texture. Or rather, it might not wait;
>>> > instead just queue more rendering commands, and the GPU's pipeline and
>>> > hw synchronization primitives will synchronice automatically, without
>>> > stalling any process on a CPU.
>>> >
>>> > Can you preserve that with this proposal?
>>> Yes, that will work exactly the same way.
>>>
>>> >
>>> > To me it looks like your patch 2/4 explicitly breaks this by waiting
>>> > for the frame event right after sending a buffer. The client is forced
>>> > to stall, until rendering of *current* frame hits the screen, as
>>> > opposed to the rendering of the previous frame.
>>> That patch waits for the frame callback after it has committed the
>>> frame instead of before, which reduces input latency.
>>
>> This, and your reply above are contradictory.
> The changes in 2/4 is not required for the proposal.
>
>>
>> The point is, can the client do any useful work between sending a
>> frame, and waiting for it to show. Before your changes, eglSwapBuffers
>> returned, and will wait on the next call, leaving the chance to do work
>> between sending a buffer and waiting for it to be processed. Therefore,
>> part of the time that must pass between frames, can be spent on
>> computing in the client.
>>
>> If I read your patch 2/4 right, after your changes, you first send a
>> buffer, commit, and then wait for the frame callback of that commit to
>> arrive, before returning from eglSwapBuffers. Hence the client is
>> completely synchronized to and blocked by the server, guaranteeing the
>> maximal waste of time. If we had only uniprocessor systems, this
>> would probably not be too bad, since the two processes couldn't run at
>> the same time anyway.
>>
>> Also, being blocked inside EGL does not allow any Wayland events, except
>> those for the EGL itself, to be processed, so there is no escape.
> When an EGL client renders faster than the framerate they will still
> be blocked as much inside EGL only it will had 1 frame of delay.
> However when it renders slower than the framerate we want to use the
> current behavior where we block on the previous frame.
> This allows the client to render a frame ahead, which in practice
> means that it will usually never block and it will stay busy drawing
> frames.
> We should probably preserve this behavior until we get something better.
>
> That something better would be to decide to wait for the frame event
> based on the time spend drawing the previous frame.
> Given that with DRM part of this time is spent at the compositor that
> may require some cooperation with the compositor or the compositor's
> EGL side to be effective.
> We also need a way to provide the desired framerate to EGL, which
> sounds like a decent vendor neutral EGL extension.
>
> Anyway, that only matter for games, for regular clients we want to use
> wl_egl_window_take_buffer or
> eglSwapBuffers (with swap interval at 0) so we never block in EGL and
> can stay in the event loop ready to react.
>
>>
>>> > I'm not familiar enough with the Mesa code to see everything from the
>>> > patches.
>>> >
>>> > I think this will also be effected by when does the compositor actually
>>> > send frame callback events, on starting repaint as it is now, or when
>>> > the frame has actually hit the screen. At least for me that is still an
>>> > open question.
>>> I don't think that affects this proposal versus eglSwapBuffers.
>>
>> It affects what happens: does sending the frame callbacks wait for the
>> compositor's rendering to finish and hit vblank for the flip, or not.
>> We should keep the whole stack in mind, otherwise we probably overlook
>> some interactions.
>>
>> If your tests currently show no bad effects, that may change, if the
>> frame callback handing in the server is changed. Therefore I would be
>> very careful before doing any changes to EGL APIs until the server
>> behaviour has been decided.
> This has more to do with when to wait for the frame event. It doesn't apply to
> wl_egl_window_take_buffer at all and wl_egl_window_commit follows
> eglSwapBuffers' example.
>
>>
>> Assuming any changes to EGL APIs are possible at all...
> It's just new entry-points in a shared library. Of course, you have to
> convince Kristian to add then first...
>
>>
>>> >>  - Client are in control of the damage, attach and commit requests.
>>> >>    I find that to be cleaner and wouldn't require an extension to specify damage.
>>> >
>>> > With the buffer age extension, that makes sense, especially for nested
>>> > compositors and other 2D-ish apps.
>>> >
>>> >>  - You can attach a wl_buffer to multiple wl_surfaces.
>>> >>    For example you could use a single buffer for 4 window decorations subsurfaces and
>>> >>    perhaps even draw decorations with a single draw call.
>>> >
>>> > How do you intend the example would work? Are you assuming something
>>> > from scaling & clipping extension?
>>> Yeah, you'd need clipping to do that.
>>
>> How can there be any savings compared to using just one surface for all
>> decorations? Or what was the motive for 4 sub-surfaces if not saving
>> buffer space?
> There's less (insignificant) overhead. It's easier to draw into a
> single surface and there's probably some cache benefits as well.
>
> We may also consider how to allow clients to prerender multiple
> wl_buffers using EGL. Clients could call wl_egl_window_take_buffer a
> number of times, but I think we want to allow for that to block on the
> compositor after using up 3 buffers. We could add a flag somewhere to
> wl_egl_window or an EGL extension to change that behavior.
> Alternatively clients could create a wl_egl_window per wl_buffer,
> which would probably have a higher setup cost (more EGL surfaces), but
> they clients should be able to reuse them. At least these changes
> allows you to do that, which I'll add to the pro list.