Multi{pointer,touch} & Userspace

Tue Nov 4 02:10:08 PST 2008

Hello Peter,

first of all, thanks for your opinions.

> One of the problems you will run into here is that it is hard to convert raw
> data into specific packets without many pre-configured assumptions or just
> plain guesswork.
> For example, if you can detect two distinct blobs from two fingers, this may
> be two fingers from the same hand or two fingers from two different hands. In
> both cases, the decision on how these packets are divided or linked together
> require. It gets even worse if you account for hovering items in the detection
> (e.g. you can see the finger touchpoint _and_ parts of the hand hovering)
In fact, I think that this even a point in favor of the layered
approach. You are of course correct (and Jim Gettys too, he also
mentioned this). However, I believe that the lowest layer is in the best
position to know what its hardware is capable of and to use these
assumptions to generate an abstraction of the input data.

> The transformation into screen coordinates is of little issue. Ideally, you'd
> want applications using multi-touch stuff to be aware of the events anyway,
> in which case you'd just use the device coordinate space.
What if, for example, you have a camera-based input device with a
fisheye lens? I can't imagine that every frontend should do the radial
undistortion itself..

> Much more important is the transformation of the blobs into a polar coordinate
> system (e.g. is it a diagonal blob or a thumb, rotated by 30 degrees?). Where
> are you planning on doing this?
The position protocol delivers the major and minor axis vectors of an
equivalent ellipse; is this what you are thinking about?

> You cannot easily separate the interpretation layer from the previous two
> layers as only the interpretation layer can know whether something is a
> "pinch" gesture or just two users happen to move into the same direction with
> their fingers. You need something as close to raw data in this layer as
> possible.
Well, the position protocol does allow for this distinction, as long as
the hardware is actually capable of sensing the difference. Every
position object (*) has a "parent id" field, so in your example, a pinch
gesture would only be triggered on fingers with the same parent id. If
the hardware can't distinguish the two cases, then the parent id should,
e.g., always be 0xDEADBEEF (or whatever), and everything works as
expected, too.

However, maybe you don't actually want to prevent two people from
scaling something together - just a thought.

> Dividing detection and interpretation into distinct layers looks good on paper
> but it becomes hard to do anything useful. OTOH, merging them into one layer
> looks good on paper but but it becomes hard to do anything useful. :)
*sigh* Nicely put :-)

Yours, Florian

(*) Note that I don't say "position packet". I'd like to counter the
assumption that the "plaintext-over-UDP" method is the only way of
delivering events. I'm happy with XML, too :-)
-- 
0666 - Filemode of the Beast