Some of my thoughts on input for wayland

Mon Jan 24 13:21:22 PST 2011

On 01/24/2011 02:30 PM, Kristian Høgsberg wrote:
> On Sun, Jan 23, 2011 at 9:03 PM, Chase Douglas
> <chase.douglas at canonical.com> wrote:
>> Hi all,
> 
> Hi Chase,
> 
>> I haven't been involved in wayland development at all yet :), but I have
>> been working on XInput 2.1 multitouch additions and gesture work in
>> Ubuntu. I have a few thoughts on how a new input system for wayland
>> might work.
>>
>> To go along with that, I have no idea if these ideas have been discussed
>> before or not, nor whether the wayland architecture would allow them.
>> These are just some generic thoughts I've had on an input service
>> architecture.
> 
> Thanks for jumping in and sharing your thoughts.
> 
>> First I'd like to address what I think we can learn from X. X11 has a
>> core protocol and an XInput extension with two major versions. To
>> develop additions to the input system in X you must meet two obligations:
>>
>> 1. Develop alongside all the other work going on in X
>> 2. Be backwards compatible with the previous input systems
>> 3. Be integrated into the same display server source code
> 
> I think this is true of any mature, successful project...

I disagree. If anything, open source desktop environments show the
opposite. Instead of developing for Windows 7 vs XP we develop on top of
Qt and Gtk+, which is entirely separated from X, which is entirely
separated from the kernel. They all release at different times and with
varying versions. It can be done.

>> I think we could take a different approach with Wayland: separate input
>> from display. What does the input system need from the rest of X?
>> Nothing really other than window regions and hierarchy on the screen.
> 
> and I don't belive splitting input out into a separate project and/or
> process changes that.  Putting input handling in a separate process
> doesn't make it any easier to swap in a new input system.  The bulk of
> the work will be in porting clients and toolkits and dealing with
> backwards compatibility to the old system(s).  Also, the mix-and-match
> idea will certainly lead to fragmentation - it's bad enough that we're
> introducing a new display server to the world, but it's certainly
> better than "a new display server with a choice of 3 different input
> systems".  Jesse already pointed out some of the problems with the IPC
> between input and output servers - you have to share the entire
> compositor scene graph with the input server, wobbly windows, picking
> algorithm and all.  Handling input in the compositor and getting input
> redirection right was one of the main ideas that prompted Wayland.
> 
> Once of the things that X got right was the extension model.  Wayland
> takes it one step further by making everything an extension: the only
> thing that's fixed in the Wayland protocol is an interface for
> discovering other interfaces.  If it turns out that we need to update
> the input model, we have versioning built in for incremental updates,
> and we can add an entire new model if we need to start from scratch.
> Finally, a compositor can expose its own interfaces in addition to the
> Wayland interfaces, so it's possible to add extra functionality
> specific to a given compositor.  It's even possible for a compositor
> to define its own input system if it comes to that, but the aim of
> Wayland is to be a generic input/output multiplexor that scales from
> handhelds up to full-blown desktops.  I hope you can help make that
> happen!

I thought some more after reading your comments. In the end, I realized
it may be easier to split this up into two thoughts: versioned and
implicit input protocols, and separating input into a separate
process/thread.

First, versioned and implicit input protocols. The mechanism you
described in the second paragraph allows for versioned protocols, and is
an evolution forward for X extensions. That's great, but it can still
leave us in a quagmire.

The X core protocol defined both the display and the input sides. In
doing so, we have forced ourselves to be completely backwards compatible
forever going forward. What if that weren't the case (and input systems
were all extensions)?

Today we could be developing XInput 2.1 for multitouch. X core may have
been deprecated some time ago. On most distributions, only XI 1.5 is
shipped by default; if you want to run apps from the 80's, install
xserver-xorg-input-core. When we merge in XInput 2.1, we deprecate XI
1.5 and suggest keeping it installed for 3 years. After three years, XI
1.5 is dropped from most distributions' default installs as well.

Back to reality, the main toolkits already implement XI 1.5. Work is
ongoing to bring gtk+ to XI 2.0 and 2.1, and Qt is integrating the
multitouch work from XI 2.1 while relying on XI 1.5 for the rest. Most
applications are written using one of these toolkits, so forward porting
isn't a big issue. If each X input extension were separate in source
code, maintenance would also be much easier. Unfortunately, that's not
the case, and it presents a challenge to anyone wishing to extend the X
input system.

I'm not advocating a free for all when it comes to input systems where
you pick and choose what you want. I think we should strive for an input
system to be extended rather than rewritten from scratch as much as
possible. Maybe we'll get lucky and never have to rewrite the input
system again :). However, every decade or so it seems we need to extend
input in ways that break backwards compatibility in the protocol. So
essentially, my argument can be boiled down to: I don't think we should
explicitly specify a "Wayland" input protocol. Let the input side be
provided through extensions, and perhaps ordain a specific extension or
set of extensions as the canonical input system at any given time.

Second, splitting input into a separate thread or process. We are
hitting the serialization challenge with gestures today. We need to be
able to analyze multitouch input and determine gestures, but this is
dependent on where the touches fall within regions on the screen. There
may be two separate compositing windows that want to know about gestures
at the same time. Think of two documents open side by side.

As we recognize gestures, we must map them to the windows on screen. If
the windows move, we have to keep track of that. We are very limited on
how we get both of these pieces of data in X. This has forced us to go
completely serial in approach. We begin to worry about what performance
impacts there will be on the window manager or the window server.
However, if we keep the window hierarchy in shared memory with
appropriate IPC mechanisms, we can minimize serialization.

I hope that wasn't too long :(...

Thanks,

-- Chase