Second Feedback request for my GSoC project to improve Present support in Xwayland

Roman Gilg subdiff at gmail.com
Tue Aug 1 18:53:02 UTC 2017


According to the feedback I received on my first mail (see:
https://lists.x.org/archives/xorg-devel/2017-July/054136.html) I reworked
huge parts of my code for enabling Present pixmap flips in Xwayland. The
current code now provides the following functionality:


--------
Summary:
--------

(1) Deactivate queued presentation support for now (until we know which
kind of Wayland protocol to use for).
(2) Per window flipping when the pixmap flipping window region equals the
toplevel window.
(3) Listen on buffer release by Wayland compositor and only then allow
pixmap reuse. This is important but kind of controversial (see below).
(4) Support window pixmap flipping via Wayland sub-surfaces for windows,
that do _not_ region equal their toplevel window, i.e. windows which are
composited into the toplevel window with the Composite extension.


----------
In detail:
----------

ad (1):

This was rather simple. We just return on queue_vblank with an error (
https://github.com/subdiff/xserver/blob/presentInXwayland/hw/xwayland/xwayland-present.c#L116).
The Present extension then automatically just pushes out the flip directly
like in the non timed case. I assume to get it working again with the new
mechanism discussed by Pekka instead of frame callbacks I only need to do
integration work in the Xwayland DDX.

ad (2):

This was not simple at all. The DIX Present code had been written from the
ground up to only support flips for one full screen windows, whereas full
screen means spanning a whole _Screen. So I wrote a secondary code path
"Rootless mode", selected by a DDX by setting a Boolean in the screen_info
struct.

Overview of differences in Rootless mode:

* Maintain a list of all windows, that are currently flipping pixmaps.
* Instead of saving the current flip_pixmap and so on in
present_screen_priv save it in present_window_priv.
* Don't restore the screen pixmap, instead save and refcount the pixmap
used by the window before Present took over and restore the window with
this pixmap.
* Introduce an additional DDX function hook flip_executed, that is called
directly before the flip is finished being executed in Present to then
fully commit the flipped window pixmap with the associated damage and
directly signal the flip as being complete.

A possible downside is that in this Rootless mode one "window hierarchy" (I
hope that's the fitting notion, it's one-to-one to a xwl_window, which gets
only created for WindowRec.redirectDraw != RedirectDrawManual.) is always
associated with one RRCrtc. The RRCrtc might be created as a dummy for that
window hierarchy. That's the way I do it in Xwayland. It is not possible to
make the RRCrtc objects independent of the windows. This is fine in the
Xwayland case. But if another DDX would emerge in the future, which wants
to use the Rootless mode with independent RRCrtcs there would be a
mechanism needed to make Present aware of a currently flipping pixmap on
the RRCrtc, identify the possibly different associated window and restore
it before changing to the other window.

ad (3):

In the Weston session I noticed extreme graphical artifacts in Neverball
(for a description of this app see below in Testing section). For sure this
was because of the reuse of pixmaps by the application after Xwayland
signaled that the pixmap is free to use again, but before the compositor
has released it. There were no artifacts in KWin, so I assume KWin just
quickly enough copies the pixel content and doesn't touch it again
afterwards. In the Weston case though the following happened:

* Present gives pixmap A to Xwayland.
* A gets immediately committed and the flip is signaled as being complete
to Present (if we would wait for the frame callback, we would have an
implicit Vblank frame rate limit).
* Present signals PresentCompleteNotify to the app and the app then
directly starts drawing the next pixmap B, which is then again sent to
Xwayland.
* B gets immediately committed again and the flip is signaled es being
complete
* With B being flipped Present assumes, that A is directly reusable and
signals PresentCompleteNotify for B _and_ PresentIdleNotify for A.
* The app repaints into A or destroys it while still being read out by
Weston in some way.

The solution is to listen for the buffer release signal by the Wayland
compositor and only on this signal send PresentIdleNotify to the app
instead of directly sending it on the next PresentCompleteNotify. To enable
that in the Rootless code path I did the following:

* In Present don't directly signal PresentIdleNotify on
PresentCompleteNotify of the next pixmap.
* Don't destroy the vblank struct after the flip. Rather put it in a field
present_window_priv_rec.flip_active and after the next flip has been
processed put it in the added static xorg_list present_idle_queue.
* To get further processed with PresentIdleNotify and deleted from this
list again the DDX just needs to send an additional third
present_event_notify with the event_id of the flip.

Reminder: For potentially other DDX, that want to use the Rootless code
path in Present, comment the code, that they must not forget to send this
third present_event_notify. Otherwise the old pixmaps get never freed.

The controversial part: Unfortunately the Present protocol specification
mentions in one sentence in between, that PresentIdleNotify is directly
send on the PresentCompleteNotify of the next pixmap:
https://cgit.freedesktop.org/xorg/proto/presentproto/tree/presentproto.txt#n412

So if an application instead of freeing only pixmaps after it received the
PresentIdleNotify event, maintains a list of acquired pixmaps and directly
reuses pixmaps after it got PresentCompleteNotify for another one, we have
a problem. But I assume it would just lead to visual artifacts again.
Another more serious problem would be, if an application only uses a fixed
number of pixmap and never uses one which hasn't yet received the
PresentIdleNotify event. In this case the app execution could halt.

On the other side I can't think of another way of removing the visual
artifacts than to hold out on PresentIdleNotify until the Wayland
compositor has signaled the buffer release. It's also the most natural
thing to do in the Wayland environment. Neither of my test applications had
problems with it. All visual artifacts directly vanished. My assumption is
that any modern application or framework using Present is able to use an
arbitrary number of pixmaps or are at least able to reuse one of their
finite pixmaps even if PresentIdleNotify wasn't yet signaled for the pixmap.

ad (4):

With the above I'm able to reliable flip pixmaps, when the flipping window
region equals the region of the toplevel window. This is according to my
experiments the case when not using any Composite clipping - see VLC in
Testing below for a good example. But tearing also happens in composite
windows, especially if a Wayland server wants to outscan a Xwayland buffer
to an overlay plane.

For that I use a sub-surface for the flipping window. The flipping window's
pixmap buffer is flipped exclusively on the sub-surface, while the rest of
the window is committed to the main surface. I had problems with the right
alignment of the sub-surface and the disabling of input on the sub-surface,
but my research showed that these problems are inside KWin. In
comparison this works on Weston. The problem I have on Weston is a botched
up graphical depiction of the sub-surface content in general while it works
on KWin (see related question in Open Questions below).


-----
Code:
-----

(1) and (2) are in my
https://github.com/subdiff/xserver/tree/presentInXwayland branch. As a diff
to master:
https://github.com/subdiff/xserver/compare/master...presentInXwayland

(3) is in a separate branch
https://github.com/subdiff/xserver/tree/presentInXwaylandBufferRelease
based on the presentInXwayland branch. Here the diff:
https://github.com/subdiff/xserver/compare/presentInXwayland...presentInXwaylandBufferRelease

I have (3) for now in a separate branch because of the "controversy"
described above. But as I said (3) is really necessary because of the
artifacts. You can check out how bad they are yourself by compiling the two
branches.

(4) is currently split up between the branches. The more recent code is in
the (3) branch.


--------
Testing:
--------

On KWin (git master) and Weston (git master) sessions. I use primarily two
applications for testing:

* Neverball: Full screen game with uncapped frame rate. Defaults to
Xwayland, but can also be set to Wayland native mode via
SDL_VIDEODRIVER=wayland.
* VLC: Video player with windowed mode (clipping parts with Composite
extension) and full screen mode (clipping with Composite extension for a
short time at the start of a video because of a small bar at the top if the
video could be resumed somewhere and afterwards showing a full screen
video). To test Present I needed to disable the "Accelerated video output"
option in the VLC Video settings. Otherwise VLC doesn't use Present at all
for some reason.

The Neverball window never shows for me on Weston and Xwayland master. With
my Xwayland branch it shows on Weston. But as mentioned it shows extreme
visual artifacts without the buffer release callback, with it no artifacts.

VLC shows tearing in Weston with Xwayland master. No tearing with my branch
as full screen.

Opening and closing as well as minimizing and reopening the test
applications works fine in my test runs. For that I had to make sure that
on unrealize of a window or the change of the presenting window the
Xwayland code rechecks if it can still flip and recreates the flip commits
or cleans up all remaining events and used objects. So if you experience
problems on window switching or closing there might still be some items
I've missed cleaning up correctly and I would be happy if you leave a
message.


---------------
Open Questions:
---------------

* Is the controversy really a problem? Is there another way than (3)?
* For (4) is the Composite extension the only explanation for the region of
the xwl_window to differ from the presenting child window? Are there other
cases I don't yet know about that I have to look out for?
* On Weston the sub-surface content is botched up completely. Is maybe the
support for desynchronized sub-surfaces in Weston not yet fully functional?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.x.org/archives/xorg-devel/attachments/20170801/d792fe5f/attachment.html>


More information about the xorg-devel mailing list