multitouch

Mon Feb 8 21:34:03 PST 2010

On Mon, Feb 08, 2010 at 06:23:53PM +0900, Carsten Haitzler wrote:
> > > > There's also the fact that the current approach that Benjamin suggested 
> > > > requires an extra client to manage the slave devices.
> > > 
> > > OTOH, if you're getting serious, there needs to be an instance
> > > translating events into gestures/metaphors anyway. So I don't see the
> > > point of avoiding an instance you're likely to need further on.
> > 
> > A gesture recogniser instance will be mandatory. However, a client that
> > modifies the list of input devices on demand and quite frequently hopefully
> > won't. Benjamin's approach puts quite a load on the server and on all
> > clients (presence events are sent to every client), IMO unnecessarily.
> 
> why should one be at the xi2 event level? i'm dubious of this. i've thought it
> through a lot - you want gesture recognition happening higher up in the toolkit
> or app. you need context - does that gesture make sense. if one gesture was
> started but it ended in a way that gesture changed, u ned to cancel the
> previous action etc. imho multitouch etc. should stick to delivering as much
> info that the HW provides as cleanly and simply as possible via xi2 with
> minimal interruption of existing app functionality.

I think my wording was ambiguous - I do not want a gesture recognizer on the
X side of the protocol. I want the X server to forward the events as
unmodified as possible to the right client. That's all.

> > The basic principle for the master/slave division is that even in the
> > presence of multiple physical devices, what really counts in the GUI is the
> > virtual input points. This used to be a cursor, now it can be multiple
> > cursors and with multitouch it will be similar. Most multitouch gestures
> > still have a single input point with auxiliary information attach.
> > Prime example is the pinch gesture with thumb and index - it's not actually
> > two separate points, it's one interaction. Having two master devices for
> > this type of gesture is overkill. As a rule of thumb, each hand from each
> > user usually constitutes an input point and thus should be represented as a
> > master device.
> 
> well that depends - if i take both my hands with 2 fingers and now i draw thins
> with both left and right hand.. i am using my hands as 2 independent core
> devices. 

Really? your fingers are more flexible than mine then, because while I can
draw four lines like this, in reality they will be two sets of two lines
instead. Hence again, two input points with one auxiliary point.
Now, if you'd be starting to use your nose in addition to your hands, that's
what I'd accept as a third input point (not sure if I'd want to be the next
person to use the touchscreen then though ;)

Buy you are right, the 2 master devices are a rule of thumb only and do not
apply to all (most?) cases.

> the problem is - the screen can't tell the difference - neither can
> the app. i like 2 core devices - it means u can emulate multitouch screens
> with mice... you just need N mice for N fingers. :) this is a good way to
> encourage support in apps and toolkits as it can be more widely used.

> > An example device tree for two hands would thus look like this:
> > 
> > MD1- MD XTEST device
> >    - physical mouse
> >    - right hand touch device - thumb subdevice
> >                              - index subdevice
> > MD2- MD XTEST device
> >    - physical trackball
> >    - left hand touch device  - thumb subdevice
> >                              - index subdevice
> >                              - middle finger subdevice
> > 
> > Where the subdevices are present on demand and may disappear. They may not
> > even be actual devices but just represented as flags in the events.
> > The X server doesn't necessarily need to do anything with the subdevices.
> > What the X server does need however is the division between the input points
> > so it can route the events accordingly. This makes it possible to pinch in
> > one app while doing something else in another app (note that I am always
> > thinking of the multiple apps use-case, never the single app case).
> 
> well this assumes u can tell the difference between 2 hands... :)

you can't, yet. there's no reason to believe it can't be done in the future
though.

> > When I look at the Qt API, it is device-bound so naturally the division
> > between the devices falls back onto X (as it should, anyway).
> > The tricky bit about it is - at least with current hardware - how to decide
> > how many slave devices and which touchpoints go into which slave device.
> > Ideally, the hardware could just tell us but...
> 
> well 2nd, 3rd, 4th etc. fingers for 1 hand would go in as slaves no?

touchscreens don't have a state. the physical device has, but the actual
touchpoints do not. so while you could create on slave device for each
possible touch statically or on demand, this can equally well be served by a
simple flag in a new type of event. The costs are less and it maps the
actual hardware better too.

Note that the MD/SD hierarchy is _not_ designed for multitouch, and trying
to squeeze multitouch hardware into it may be a mistake (as your comment 
at the bottom of this email rightly says). The key is finding a new approach
that works for the use cases that matter.

> > this approach works well for mouse emulation too, since the first subdevice
> > on each touch device can be set to emulate mouse events. what it does lead
> > to is some duplication in multi-pointer _and_ multi-touch aware applications
> > though, since they have to be able to differ between the two.
> > 
> > until the HW is ready to at least tell the driver what finger is touching,
> > etc., the above requires a new event to label the number of subdevices and
> > what information is provided. This would be quite similar to Qt's
> > QTouchEvent::TouchPoint class and I believe close enough to Window's
> > approach?
> 
> well the hw here knows 1st and 2nd etc. finger, if i release my 1st finger and
> keep 2nd down - 2nd reports events, 1st doesnt. so it knows the order of
> touches - and keeps that synced to the points. but knowing if its thumb or
> index or pinky or the other hand etc. - can't find out.

optical hardware has the potential to identify finger types. we need to
cater for those too.

> > I'm still somewhat opposed to sending the extra data as valuators. While
> > it's a short-term fix it's a kludge as it lacks some information such as
> > when touchpoints appear/disappear. This again can be hacked around, but...
> 
> yeah. i agreee. has good points, but down sides too. separate devices (slaves)
> for the extra touches seems good to me - first touch is a core device and maps
> well to existing singe-touch screens and mice.

fwiw. there are no core devices anymore. historically, core devices were
core-only devices, with the later addition of the SendCoreEvent option that
made some devices send XI and (through another device) core events too
though these events were not associated at all.
Now we have slave devices (XI and XI2 only) and master devices (XI, XI2 and
core), where events from SDs are routed through MDs, possibly (but not
always) generating core events on a given window for a specific client.
The term of "core device" is thus fuzzy and quite frankly, if you say "first
touch is a core device" I don't actually know what you refer to.

> > as I said above, the issue isn't quite as simple and it should scale up to
> > the use-case of 2 users with 3 hands on the table, interacting with two
> > different applications. So while #2 is the most logical, the number of
> > master devices needs to equal the number of virtual input points the users
> > want. and that's likely to be one per hand.
> 
> but we have a problem now... we only have master and slave. we need N levels. i
> need on a collaborative table:
> 
>         person
>          /   \
>        hand hand
>       / | | | | \
> finger /  | |  \ finger
>  finger   | |   finger
>      finger finger
> 
> in the end... n levels is likely going to be needed. we can flatten this sure,
> but in the end you will not be able to anymore. :(

We can add fake extra levels with labels or flags. In the example above, why
do you need a level for "Hand"? couldn't you just label the fingers that
belong together as hand? I don't think an actual extra level is needed on
the X side. 

The really hard problem IMO is that given two touch points at coordinates
x1/y1 and x2/y2 and two different clients at these coordinates do we
- send the second touchpoint to the client that received the first
  touchpoint?
- send the second touchpoint to the second client?
This is the main roadblock at the moment, and anytime I try to come up with
a working solution I hit a wall at even quite basic use-cases.

Cheers,
  Peter