multitouch

Tue Feb 9 22:37:00 PST 2010

On Tue, Feb 09, 2010 at 10:50:40AM +0100, Bradley T. Hughes wrote:
> On 02/09/2010 06:34 AM, ext Peter Hutterer wrote:
> >The really hard problem IMO is that given two touch points at coordinates
> >x1/y1 and x2/y2 and two different clients at these coordinates do we
> >- send the second touchpoint to the client that received the first
> >   touchpoint?
> 
> Another option is to discard the second touch point if x2/y2 isn't
> over the window that got x1/y1 (this is how Windows 7 behaves on the
> hardware I have). I don't like this behavior though.

Me neither, it essentially makes multi-user interaction on a large shared
display impossible.

> >- send the second touchpoint to the second client?
> 
> This is the behavior I would like to see. It's the one that promotes
> the multiple-user and multiple-client use case the best.
> 
> >This is the main roadblock at the moment, and anytime I try to come up with
> >a working solution I hit a wall at even quite basic use-cases.
> 
> Would you mind elaborating a bit on this?

sure, I'm probably repeating myself with some of the points below, but it's
easier for me to get a train of though going.

The basic assumption for multitouch screens is that it will give us multiple
touchpoints simultaneously from a single physical device. At this point, we
have virtually no user-specific (or body-part specific) information attached
to these points. Furthermore, in X we do not know any context-specific
information. Any touchpoint other than the first may
- belong to a different user
- belong to the same user but a different bodypart that qualifies as "new
  input device" (think left hand + right hand working independently)
- belong to the same user but the same bodypart and be auxiliary (think
  thumb+index during pinching)

In addition to that, any point may be part of a gesture but without the
context (i.e. without being the gesture recognizer) it's hard to tell if a
point is part of a gesture at all. Worse, depending on the context, the same
coordinates may be part of a different gestures.
Given two touchpoints that start close to each other and move in diagonally
opposite directions, this gesture may be a user trying to zoom, a user
trying to pick to items apart or two users fighting for one object. without
knowing what's underneath, it's hard to say.
So we need a way handle touchpoints in a way that we're not restricting any
of these. All while staying compatible to the core, XI1 and XI2 protocol.
X's main task here is to be an input multiplexer and send the events to the
right clients since at any point in time we may have more than one
multitouch application running. Picking the right client is the important
thing.

A few already discarded ideas:
- send all additional touchpoints to the app that got the first touch event.
  disables simultaneous multitouch.
- have a separate API that essentially exists next to XI2 and sends touch
  events to all clients interested. doesn't really scale and is a nightmare
  to sync since two apps may get the same event.
- introduce region-based grabs, i.e. let a client grab a touchpoint
  including a surrounding area so that future touchpoints that happen in
  this area also go to this client. a race-condition hellhole since those
  regions would need to be updated during grabs.
- approach similar to XTEST devices, where touchpoints are routed through a
  device attached to each master and each master device has such a device.
  If touchpoints are close to each other, the driver routes it through the
  matching device. you need a "fuzz" setting in the driver to decide when
  touchpoints belong togehter (which needs to be run-time configurable), you
  need a lot of sync between driver and server (including the driver knowing
  what master devices are present which I don't really like) and touchpoints
  can belong to each other without being related to each other (multiple
  user case).
- the Metisse approach: send all events to an external process that knows
  about multitouch and let it decide how they belong togethter. this process
  would have to have the same gesture engine as the client, with the same
  settings, and it needed to buffer events to decide if one is part of a
  gesture, just to decide later it wasn't and you need to replay the event
  from the server. then the client would get this event delayed and would
  likely still try to make it part of a gesture, thus adding further delay
  to it.

The current idea, not yet completely discarded is to send touchpoints to the
client underneath the pointer, with the first touchpoint doing mouse
emulation. a touchpoint that started in a client is automatically grabbed
and sent to the client until the release, even if the touch is released.
thus a gesture moving out of the client doesn't actually go out of the
client (behaviour similar to implicit passive grabs).  While such a grab is
active, any more touchpoints in this client go through the same channel,
while touchpoints outside that client go to the respective client
underneath.

problem 1: you can't really do multi-mouse emulation since you need a master
device for that. so you'd have to create master devices on-the-fly for the
touchpoints in other clients and destroy them again. possible, but costly.

problem 2: gestures starting outside the client may go to the wrong one. not
sure how much that is a problem, I think that's more a sideeffect of a UI
not designed for touch.

problem 3: this requires the same device to be grabbed multiple times by
different clients, but possible not for mouse emulation. And a client
doesn't necessary own a window and events may be sent to multiple clients at
the same time, all of which would then need such a grab. I think this is
where this approach breaks down, you'd get multiple clients getting the same
event and I'm not sure how that'd work out.

Oh, and did I mention that we have to be compatible to the core protocol
grab semantics for mouse emulation?

Cheers,
  Peter