Some of my thoughts on input for wayland

Mon Jan 24 14:31:40 PST 2011

2011/1/24 Chase Douglas <chase.douglas at canonical.com>:
> On 01/24/2011 02:30 PM, Kristian Høgsberg wrote:
>> On Sun, Jan 23, 2011 at 9:03 PM, Chase Douglas
>> <chase.douglas at canonical.com> wrote:
>>> Hi all,
>>
>> Hi Chase,
>>
>>> I haven't been involved in wayland development at all yet :), but I have
>>> been working on XInput 2.1 multitouch additions and gesture work in
>>> Ubuntu. I have a few thoughts on how a new input system for wayland
>>> might work.
>>>
>>> To go along with that, I have no idea if these ideas have been discussed
>>> before or not, nor whether the wayland architecture would allow them.
>>> These are just some generic thoughts I've had on an input service
>>> architecture.
>>
>> Thanks for jumping in and sharing your thoughts.
>>
>>> First I'd like to address what I think we can learn from X. X11 has a
>>> core protocol and an XInput extension with two major versions. To
>>> develop additions to the input system in X you must meet two obligations:
>>>
>>> 1. Develop alongside all the other work going on in X
>>> 2. Be backwards compatible with the previous input systems
>>> 3. Be integrated into the same display server source code
>>
>> I think this is true of any mature, successful project...
>
> I disagree. If anything, open source desktop environments show the
> opposite. Instead of developing for Windows 7 vs XP we develop on top of
> Qt and Gtk+, which is entirely separated from X, which is entirely
> separated from the kernel. They all release at different times and with
> varying versions. It can be done.
>
>>> I think we could take a different approach with Wayland: separate input
>>> from display. What does the input system need from the rest of X?
>>> Nothing really other than window regions and hierarchy on the screen.
>>
>> and I don't belive splitting input out into a separate project and/or
>> process changes that.  Putting input handling in a separate process
>> doesn't make it any easier to swap in a new input system.  The bulk of
>> the work will be in porting clients and toolkits and dealing with
>> backwards compatibility to the old system(s).  Also, the mix-and-match
>> idea will certainly lead to fragmentation - it's bad enough that we're
>> introducing a new display server to the world, but it's certainly
>> better than "a new display server with a choice of 3 different input
>> systems".  Jesse already pointed out some of the problems with the IPC
>> between input and output servers - you have to share the entire
>> compositor scene graph with the input server, wobbly windows, picking
>> algorithm and all.  Handling input in the compositor and getting input
>> redirection right was one of the main ideas that prompted Wayland.
>>
>> Once of the things that X got right was the extension model.  Wayland
>> takes it one step further by making everything an extension: the only
>> thing that's fixed in the Wayland protocol is an interface for
>> discovering other interfaces.  If it turns out that we need to update
>> the input model, we have versioning built in for incremental updates,
>> and we can add an entire new model if we need to start from scratch.
>> Finally, a compositor can expose its own interfaces in addition to the
>> Wayland interfaces, so it's possible to add extra functionality
>> specific to a given compositor.  It's even possible for a compositor
>> to define its own input system if it comes to that, but the aim of
>> Wayland is to be a generic input/output multiplexor that scales from
>> handhelds up to full-blown desktops.  I hope you can help make that
>> happen!
>
> I thought some more after reading your comments. In the end, I realized
> it may be easier to split this up into two thoughts: versioned and
> implicit input protocols, and separating input into a separate
> process/thread.

Right, I think that makes sense, I sounds like we're already on the
same page with respect to versioning.

> First, versioned and implicit input protocols. The mechanism you
> described in the second paragraph allows for versioned protocols, and is
> an evolution forward for X extensions. That's great, but it can still
> leave us in a quagmire.
>
> The X core protocol defined both the display and the input sides. In
> doing so, we have forced ourselves to be completely backwards compatible
> forever going forward. What if that weren't the case (and input systems
> were all extensions)?
>
> Today we could be developing XInput 2.1 for multitouch. X core may have
> been deprecated some time ago. On most distributions, only XI 1.5 is
> shipped by default; if you want to run apps from the 80's, install
> xserver-xorg-input-core. When we merge in XInput 2.1, we deprecate XI
> 1.5 and suggest keeping it installed for 3 years. After three years, XI
> 1.5 is dropped from most distributions' default installs as well.
>
> Back to reality, the main toolkits already implement XI 1.5. Work is
> ongoing to bring gtk+ to XI 2.0 and 2.1, and Qt is integrating the
> multitouch work from XI 2.1 while relying on XI 1.5 for the rest. Most
> applications are written using one of these toolkits, so forward porting
> isn't a big issue. If each X input extension were separate in source
> code, maintenance would also be much easier. Unfortunately, that's not
> the case, and it presents a challenge to anyone wishing to extend the X
> input system.
>
> I'm not advocating a free for all when it comes to input systems where
> you pick and choose what you want. I think we should strive for an input
> system to be extended rather than rewritten from scratch as much as
> possible. Maybe we'll get lucky and never have to rewrite the input
> system again :). However, every decade or so it seems we need to extend
> input in ways that break backwards compatibility in the protocol. So
> essentially, my argument can be boiled down to: I don't think we should
> explicitly specify a "Wayland" input protocol. Let the input side be
> provided through extensions, and perhaps ordain a specific extension or
> set of extensions as the canonical input system at any given time.

What you describe here is basically how Wayland works.  As I said
above, the only fixed interface in the Wayland protocol is how to
discover other interfaces.  When you connect to the compositor, you
will receive a series of events, each event introduces an available
object by giving you its id, interface name and version.  Right now,
one of these interfaces is "input_device", you can see the details
here:

  http://cgit.freedesktop.org/wayland/tree/protocol/wayland.xml#n386

The "input_device" interface describes the entire input protocol as it
is now.  Obviously, there's work to do, right now it's sort of like
core input + mpx.  But the point is, we can phase this out in favour
of "input_device2", which can completely replace the "input_device"
interface.  Then we can keep "input_device" around for a few years
while we port the world to "input_device2" and then eventually dump
it.  If X had let us phase out core fonts, core rendering and core
input as extensions, I think X would have lasted even longer.  It was
one of the mistakes in X I didn't want to carry over.

That said, we need to have an input protocol for Wayland 1.0 (or
whenever it's "ready").  I don't want it to be an out-of-tree project
or external protocol, I want it to be there as one of the interfaces
in the Wayland protocol, for applications to rely on.

> Second, splitting input into a separate thread or process. We are
> hitting the serialization challenge with gestures today. We need to be
> able to analyze multitouch input and determine gestures, but this is
> dependent on where the touches fall within regions on the screen. There
> may be two separate compositing windows that want to know about gestures
> at the same time. Think of two documents open side by side.
>
> As we recognize gestures, we must map them to the windows on screen. If
> the windows move, we have to keep track of that. We are very limited on
> how we get both of these pieces of data in X. This has forced us to go
> completely serial in approach. We begin to worry about what performance
> impacts there will be on the window manager or the window server.
> However, if we keep the window hierarchy in shared memory with
> appropriate IPC mechanisms, we can minimize serialization.

I think it could be feasible to split the gesture recognition out into
a separate thread, but that's really an implementation decision for a
given compositor.

Kristian