Some of my thoughts on input for wayland

Sun Jan 23 18:03:58 PST 2011

Hi all,

I haven't been involved in wayland development at all yet :), but I have
been working on XInput 2.1 multitouch additions and gesture work in
Ubuntu. I have a few thoughts on how a new input system for wayland
might work.

To go along with that, I have no idea if these ideas have been discussed
before or not, nor whether the wayland architecture would allow them.
These are just some generic thoughts I've had on an input service
architecture.

First I'd like to address what I think we can learn from X. X11 has a
core protocol and an XInput extension with two major versions. To
develop additions to the input system in X you must meet two obligations:

1. Develop alongside all the other work going on in X
2. Be backwards compatible with the previous input systems
3. Be integrated into the same display server source code

I think we could take a different approach with Wayland: separate input
from display. What does the input system need from the rest of X?
Nothing really other than window regions and hierarchy on the screen.

My proposal would be to create a new input system project (inland?) and
define a standard of access between wayland and the new input system.
The access could be provided through shared memory or some other low
latency IPC. This would allow mixing and matching of display servers and
input servers, and separate out the development practices and timelines
for greater flexibility.

Now, this may seem crazy at first. Once a new input system is defined,
wouldn't we want to standardise upon it and not change it? Hopefully
yes. But think of wayland itself. It was conceived mostly due to issues
with X graphics. Imagine if 10 years from now the graphics stack needed
another rewrite, but the input stack was perfectly fine. We could
transition to a new graphics server without modifying any application
input code if the input system was easily transferable.

There's another issue I've noticed as we've gone through XI 2.1 and
gesture work: We are building multiple serial event delivery mechanisms.
Here's an example:

1. Touchscreen nput comes into X from Linux evdev interface
2. XI 2.1 touch events are generated
3. uTouch gesture recognizer receives events through passive grab on
root window
4. Gesture recognizer recognizes a gesture
5. No client is subscribed for the gesture
6. Gesture recognizer relinquishes touch grabs
7. Touches propagate through X server
8. No other clients are found for touches
9. One of the touches is turned into pointer emulation
10. Pointer events propagate through X server once more

We see there are three event delivery mechanisms: gestures, touches, and
pointer emulation. In each case, we are potentially transforming the raw
input data into a different format and looking for clients who selected
for appropriate event types. There's also a defined precedence ordering:
gestures -> touches -> pointer emulation.

One could argue whether this is a proper precedence ordering or not, but
the point is that there is a precedence ordering. In fact, I wonder if a
better ordering might be:

gesture grabs -> touch grabs -> pointer grabs -> gesture selections ->
touch selections -> pointer selections.

As input devices are advanced into true 3 dimensional space, we may find
a need for even more intricate input service mechanisms. A more future
proof model may involve the ability to dynamically slot in input systems
as plugins. In this way, we might also be able to deprecate older input
protocols over time.

Thanks for listening to me on my soapbox! I look forward to your thoughts.

-- Chase