[RFC] Multitouch support, step one

Tue Mar 16 06:42:15 PDT 2010

Peter Hutterer wrote:
> On Mon, Mar 15, 2010 at 03:41:24PM +0100, Henrik Rydberg wrote:
>>> Preamble:
>>> Multi-touch as defined in this proposal is limited to single input-point
>>> multi-touch. This is suitable for indirect touch devices (e.g. touchpads)
>>> and partially suited for direct touch devices provided a touch is equivalent
>>> to a single-gesture single-application input.
>> User-space applications need tools to *use* MT devices, not route raw data from
>> the devices to the application. The latter is not much more complicated than
>> opening a file, and everyone can do that already. Thus, unless there is a model
>> for how MT devices work and interact with other MT devices, I see little point
>> in having an X protocol at all.
> 
> The main reason is that applications, for better or worse, use X as their
> input source. Our job is to get the data to the right client, without too
> much processing going on. For clients to go around the server by opening the
> kernel device files directly will cause issues in the long run, especially
> when you have multiple applications running.

Thank you for addressing my concerns. The details you describe below form a
logical and complete proposal, which is agreeable in its own right.

However, I must insist on continuing this discussion, because my gut
tells me we are moving slightly in the wrong direction. Here are the main reasons:

1. User space wants details, but also consistent behavior for all devices
supporting multitouch.

2. The kernel interface is bandwidth-consuming by necessity, but there is no
need for the X protocol to be.

3. Support for multitouch in X does exist already, so there is no need to start
from zero when discussing it (http://bitmath.org/code/multitouch/).

4. The hard limit of 256 guarantees something new will have to be done for
multi-user multitouch, in essence pushing the problem forward.

5. Squeezing MT into the valuator concept is generally crippling, since it does
not map very well to the underlying contact concept.

What follows is a longer version of these five points, and below that is a
proposal for how I believe it should be done.

---

1. Consistent behavior for all devices

The hardware stack supporting multitouch is diverse, and several different
mechanisms and abstraction levels exists. The tracking ID is a good example. It
may or may not be present in the driver output, and it may work poorly even if
it exists. Thus, in order to support hardware consistently, there must be a
middle layer outside of the kernel, parsing the driver data and patching it up
to produce the same level of detail for all devices. This task can be quite
complicated and uses some cpu, so having it in one place is imperative. Luckily,
there exists such a solution in the multitouch X driver (see link above). This
code can either be broken out as a standalone module or be placed in the X core.
If there is a license issue, it can be resolved for the benefit of the X
community. In the text below, this piece of code will be referred to as the
contact driver.

2. Bandwidth reduction should be made as early as possible

The MT events from the kernel are non-filtered, bypassing the normal input
filtering by necessity. Duplicating this behavior further into the food chain
would be a mistake. After parsing the MT stream in the contact driver, the event
stream can be filtered substantially, thereby restoring bandwidth usage to
something more similar to non-mt devices.

3. The contact driver produces the more digested contact events

The contact driver takes the flora of driver MT events and produces a consistent
stream of contact events. The contact event stream is less bandwidth-consuming,
and follows the init-move-destroy concept we discussed last summer, if you
recall. We are still talking about a low-level stream, there are no gesture or
other high-level derivatives. Just a consistent stream of data.

4. ABI, memory and cpu burden for nothing

Although the currently hard limit of valuators most likely can be programmed
away, it just feels wrong to burden all other applications with the additional
memory and cpu usage implied by raising a comfortable limit to something much
higher, only to satisfy the request of a completely different interface, which
strains the existing concept to the limit of breakage.

5. Use appropriate data structures to solve the problem

By defining a handful of contact api functions, operating on simple structures,
the whole problem of forward compatibility with multi-user multitouch can be
solved in one go, without changing a single bit in the existing interfaces. Yes,
it means a new interface, but the functionality is new, so this is the way it
should be.

---

X Multitouch Support Proposal
-----------------------------

Introduction
------------

Back in summer 2009, when this was discussed informally between some in the
present party, the general structure that emerged was a split into a low level
protocol and a gesture library, here citing two of the formulations:

> Henrik Rydberg:
> X multipointer via init-move-destroy to X gesture driver
> X gesture driver via enhanced X events to X application
>
> Peter Hutterer:
> X server -> protocol -> application
>                           |->  x gesture library

The change I am proposing today simply means inserting the contact driver
discussed above before or in conjunction with the X server in Peter's chain:

kernel -> [contact driver | X server] -> protocol -> application
                                                      |->  x gesture library

The details of the protocol in this chain depends on the output of the contact
driver, which is only slightly higher level than the kernel events.

The Contact Driver
------------------

The general structure of the MT events is that of contacts appearing, changing
and disappearing. Because of the diversity of capabilities of the drivers, this
structure is quite relaxed in the kernel stream, to the point that it requires
work to fully impose this structure further down the stream. That is the job of
the Contact Driver. It translates the relaxed kernel MT events into a steady
stream of contact events, containing the same level of information for all
drivers. The contact events follow the same logic as the MT events, but because
all data is present, the init-move-destroy mechanism can be employed fully. Here
is an example of what a two-finger scroll would look like:

init id = 588, x = -234, y = 42
init id = 123, x = 933,	y = 3
sync
move id = 588, x = -211, y = 529
move id = 123, x = 863,	y = 732
sync
destroy	id = 588
destroy	id = 123
sync

The X Protocol
--------------

The details here are well beyond my expertise, but I am suggesting a contact
interface implementation based on the contact driver event structure. I cannot
imagine this is harder than using the XI event structure, but should be a lot
less of a headache for everyone.

Cheers,
Henrik