[RFC] Multitouch support, step one

Mon Mar 15 22:32:19 PDT 2010

On Mon, Mar 15, 2010 at 03:41:24PM +0100, Henrik Rydberg wrote:
> > Preamble:
> > Multi-touch as defined in this proposal is limited to single input-point
> > multi-touch. This is suitable for indirect touch devices (e.g. touchpads)
> > and partially suited for direct touch devices provided a touch is equivalent
> > to a single-gesture single-application input.
> 
> User-space applications need tools to *use* MT devices, not route raw data from
> the devices to the application. The latter is not much more complicated than
> opening a file, and everyone can do that already. Thus, unless there is a model
> for how MT devices work and interact with other MT devices, I see little point
> in having an X protocol at all.

The main reason is that applications, for better or worse, use X as their
input source. Our job is to get the data to the right client, without too
much processing going on. For clients to go around the server by opening the
kernel device files directly will cause issues in the long run, especially
when you have multiple applications running.

And you're right, what we would be doing here is essentially opening a file
and forwarding the data as-is to a client where it can then be processed.
Interpretation of the data needs to happen in the client, that goes for
gestures as well as simple things like doubleclicks.

I think I wasn't clear enough in the original email so I'll try to spell
it out in more detail here:
Assume a device that provides the following info by the kernel.
    BTN_LEFT
    BTN_RIGHT
    BTN_MIDDLE
    ABS_RX
    ABS_TILT_X
    ABS_TILT_Y
    ABS_MT_POSITION_X
    ABS_MT_POSITION_Y
    ABS_MT_TRACKING_ID
    ABS_MT_ORIENTATION

The evdev driver would then set up a device based on this info. Assuming
that we go with a default MT max range of 3, the device in X would look like
this (using abbreviated xinput --list format).

Multitouch device            id=6    [slave  pointer (2)]
        Reporting 14 classes:
                Class originated from: 6
                Buttons supported: 3
                Button labels: Button Left Button Middle Button Right 
                Button state:
                Class originated from: 6
                Detail for Valuator 0:
                  Label: Abs X
                  Range: 0 - 1000
                  Resolution: 1 units/m
                  Mode: relative
                Class originated from: 6
                Detail for Valuator 1:
                  Label: Abs Y
                  Range: 0 - 1000
                  Resolution: 1 units/m
                  Mode: relative
                Class originated from: 6
                Detail for Valuator 2:
                  Label: Abs Rx
                  ...
                Detail for Valuator 3:
                  Label: Abs Tilt X
                  ...
                Detail for Valuator 4:
                  Label: Abs Tilt Y
                  ...
                Detail for Valuator 5:
                  Label: Abs MT Position X
                  ...
                Detail for Valuator 6:
                  Label: Abs MT Position Y
                  ...
                Detail for Valuator 7:
                  Label: Abs MT Orientation
                  ...
                Detail for Valuator 8:
                  Label: Abs MT Position X
                  ...
                Detail for Valuator 9:
                  Label: Abs MT Position Y
                  ...
                Detail for Valuator 10:
                  Label: Abs MT Orientation
                  ...
                Detail for Valuator 11:
                  Label: Abs MT Position X
                  ...
                Detail for Valuator 12:
                  Label: Abs MT Position Y
                  ...
                Detail for Valuator 13:
                  Label: Abs MT Orientation
                  ...

I used 3 as MT default to keep the email short :)

First thing to notice here is that the tracking ID is missing. AIUI, this is to identify a touchpoint on the device for as long as it persists. Since we have distinct valuators, we can use the tracking ID more to identify where to push the data and we do not have the need of sending it to the client.

Hence, a MT event from the kernel of the likes of (fake evtest output)

Event: time 1268716779.969270, -------------- Report Sync ------------
Event: time 1268716780.214151, type 3 (Abs), code 53 (Abs MT Position X), value 1
Event: time 1268716780.214151, type 3 (Abs), code 54 (Abs MT Position Y), value 2
Event: time 1268716780.214151, type 3 (Abs), code 52 (Abs MT Orienation), value 3
Event: time 1268716780.214151, ---------------Report MT Sync ---------
Event: time 1268716780.214151, type 3 (Abs), code 53 (Abs MT Position X), value 14
Event: time 1268716780.214151, type 3 (Abs), code 54 (Abs MT Position Y), value 15
Event: time 1268716780.214151, type 3 (Abs), code 52 (Abs MT Orienation), value 16
Event: time 1268716780.214151, ---------------Report MT Sync ---------
Event: time 1268716780.214158, -------------- Report Sync ------------

would translate into the following XI2 event (sort-of xinput format):
EVENT type 6
    device: 6 (6)
    detail: 0
    flags: 
    root: 1.00/1.00
    event: 0.00/0.00
    buttons:
    modifiers: locked 0x10 latched 0 base 0x8 effective: 0x18
    group: locked 0 latched 0 base 0 effective: 0
    windows: root 0x10e event 0x4c00001 child 0x0
    valuators-present: 0, 1, 5, 6, 7, 9, 10, 11
    valuators:  1, 2, 1, 2, 3, 14, 15, 16

So a client receiving this event knows that the values provided are two
distinct touchpoints at the position 1/2 and 14/15. If you now release
touchpoint 2, the next event would have 
    valuators-present: 0, 1, 5, 6, 7
    valuators:  1, 2, 1, 2, 3
Hence, a client knows that this touchpoint has now disappeared. The tracking ID
can be inferred by the valuator number, since it can be assumed to remain fixed
until a touchpoint disappears.

It is trying to resemble the kernel's API as close as possible without
injecting extra data (other than the window information, etc.). It is _not_
trying to process the data beyond what's necessary either.

Looking at the information we get from the kernel, a complex interface that
divides contacts by user is currently not possible - we simply lack the
information needed. Detailed interfaces like this should imo be on the
client side, where information can be stored easier and adjusted to a
specific user's preferences much simpler.

That summarises the points I have for your replies below, so I'll cut off
here :)

Cheers,
  Peter

> > Details:
> > The data we get from the (Linux) kernel includes essentially all the ABS_MT
> > events, x, y, w, h, etc. We can pack this data into valuators on the device.
> > In the simplest case, a device with two touchpoints would thus send 4
> > valuators - the first two being the coordinate pair for the first touch
> > point, the latter two the coordinates for the second touch point.
> > 
> > XI2 provides us with axis labels, so we can label the axes accordingly.
> > Clients that don't read axis labels are left guessing what the fancy values
> > mean, which is exactly what they're doing already anyway.
> 
> The idea of a wide set of dimensions to describe a set of fingers for instance,
> was considered and dropped for the kernel MT interface. There is a definite
> difference between having "three things" and having "two more of the same kind".
> The number of dimensions also increases dramatically, as pointed out by Mr.
> Poole. It makes much more sense to define contacts as multiple instances of the
> same thing, than to define each new contact as potentially something completely
> different.
> 
> 
> > XI2 DeviceEvents provide a bitmask for the valuators present in a device.
> > Hence, a driver can dynamically add and remove valuators from events, thus
> > providing information about the presence of these valuators.
> > e.g. DeviceEvent with valuators [1-4] means two touchpoints down, if the
> > next event only includes valuators [3-4], the first touchpoint has
> > disappeared.
> 
> The idea of adding and removing contacts dynamically I believe is a good idea. A
> contact has a set of attributes (x, y, etc). Why not provide a clean interface
> for the contacts as a concept, rather than mapping the not-so-independent x and
> y values into separate dynamic entities? As an example of the smallest
> meaningful dynamic entity:
> 
> struct Contact {
> 	int tracking_id;
> 	float x, y;
> 	etc etc...
> };
> 
> > Core requires us to always send x/y, hence for core emulation we should
> > always include _some_ coordinates that are easily translated. While the
> > server does caching of absolute values, I think it would be worthwile to
> > always have an x/y coordinate _independent of the touchpoints_ in the event.
> > The driver can decide which x/y coordinates are chosen if the first
> > touchpoint becomes invalid.
> 
> Seconded, but the single-touch x/y coordinates are properties of a contact
> group, not of a single contact. Example:
> 
> struct ContactGroup {
> 	int group_id;
> 	float x, y;
> 	ContactList list;
> 	etc etc...
> };
> 
> > Hence, the example with 4 valuators above becomes a device with 6 valuators
> > instead. x/y and the two coordinate pairs as mentioned above. If extra data
> > is provided by the kernel driver, these pairs are simple extended into
> > tuples of values, appropriately labeled.
> > 
> > Core clients will ignore the touchpoints and always process the first two
> > coordinates.
> > XI1 clients will have to guess what the valuators mean or manually set it up
> > in the client.
> > XI2 clients will automagically work since the axes are labeled. Note that
> > any client that receives such an event always has access to _all_
> > touchpoints on the device. This works fine for say 4-finger swipes on a
> > touchpad but isn't overly useful for the multiple client case, see
> > above.
> 
> This is at the heart of the problem, I believe. In addition to being able to
> work with a set of ContactGroups, like ContactGroupList, one needs the
> possibility to dynamically regroup them, based on geometric information and what
> not. Partitioning is the word. A toolset consisting of at least these functions:
> 
> ContactGroupList partition_contacts_geometrically(ContactList all_contacts);
> ContactGroupList partition_contacts_by_user(ContactList all_contacts);
> ContactGroupList find_contact_groups_in_window(ContactGroupList all_groups);
> etc etc
> 
> ought to be the minimum requirement on the interface, such that applications can
> do something meaningful with the information at hand.
> 
> 
> > Since additional touchpoints are valuators only, grabs work as if the
> > touches belong to a single device. If any client grabs this device, the
> > others will miss out on the touchpoints.
> >
> > XI2 allows devices to change at runtime. Hence a device may add or remove
> > valuators on-the-fly as touchpoints appear and disappear. There is a chance
> > of a race condition here. If a driver decides to add/remove valuators
> > together with the touchpoints, a client that skips events may miss out.
> > e.g. if a DeviceChanged event that removes an axis is followed by one that
> > adds an axis, a client may only take the second one as current, thus
> > thinking the axis was never removed. There is nothing in the XI2 specs that
> > prohibits this. Anyways, adding removing axes together with touchpoints
> > seems superfluous if we use the presence of an axis as indicator for touch.
> > Rather, I think a device should be set up with a fixed number of valuators
> > describing the default maximum number of touchpoints. Additional ones can be
> > added at runtime if necessary.
> 
> Some events are, as always, more important than others. If the stream bandwidth
> is a concern, there is always the possibility to tag events as "important" and
> "less important", in the same manner as focus events normally are more important
> than mouse movement events.
> 
> > 
> > Work needed:
> > - drivers: updated to parse ABS_MT_FOO and forward it on.
> > - X server: the input API still uses the principle of first + num_valuators
> >   instead of the bitmask that the XI2 protocol uses. These calls need to be
> >   added and then used by the drivers.
> > - Protocol: no protocol changes are necessary, though care must be taken in
> >   regards to XI1 clients. 
> >   Although the XI2 protocol does allow device changes, this is not specified
> >   in the XI1 protocol, suggesting that once a device changes, potential XI1
> >   clients should be either ignored or limited to the set of axes present
> >   when they issued the ListInputDevices request. Alternatively, the option
> >   is to just encourage XI1 clients to go the way of the dodo.
> > 
> > Corner cases:
> > We currently have a MAX_VALUATORS define of 32. This may or may not be
> > arbitrary and interesting things may or may not happen if we increase that.
> > 
> > A device exposing several axes _and_ multitouch axes will need to be
> > appropriately managed by the driver. In this case, the "right" thing to do
> > is likely to expose non-MT axes first and tack the MT axes onto the back.
> > Some mapping may need to be added.
> > 
> > The future addition of real multitouch will likely require protocol changes.
> > These changes will need to include a way of differentiating a device that
> > does true multitouch from one that does single-point multi-touch.
> > 
> > That's it, pretty much (well, not much actually). Feel free to poke holes
> > into this proposal.
> 
> Ok, in conclusion, my two cents are: Do not add MT functionality as evaluators
> in X, but implement a proper Contact interface from the start.
> 
> Cheers,
> Henrik
>