Multi{pointer,touch} & Userspace

Peter Hutterer peter.hutterer at
Mon Nov 3 22:23:37 PST 2008

On Wed, Oct 29, 2008 at 08:51:24AM +0100, Florian Echtler wrote:
> a while ago, after the MPX merge, there was a brief discussion about
> multipointer support for userspace. I'd like to revive this discussion,
> and I have some bits to restart it:
> - At , you can find the first beta release of our
>   multitouch development framework. While this may not be of immediate
>   interest to you, there's two important points which I'd like to
>   mention:

I can see a number of issues that arise from the abstraction into 4 layers
that seemingly work independent.
>From the website:
    * hardware abstraction layer: takes raw data from the input hardware and
      generates data packets containing the positions of hands, fingers and
      objects (if available)

One of the problems you will run into here is that it is hard to convert raw
data into specific packets without many pre-configured assumptions or just
plain guesswork.

For example, if you can detect two distinct blobs from two fingers, this may
be two fingers from the same hand or two fingers from two different hands. In
both cases, the decision on how these packets are divided or linked together
require. It gets even worse if you account for hovering items in the detection
(e.g. you can see the finger touchpoint _and_ parts of the hand hovering)

    * transformation layer: converts the data into screen coordinates and
      outputs transformed data packets

The transformation into screen coordinates is of little issue. Ideally, you'd
want applications using multi-touch stuff to be aware of the events anyway,
in which case you'd just use the device coordinate space.
Much more important is the transformation of the blobs into a polar coordinate
system (e.g. is it a diagonal blob or a thumb, rotated by 30 degrees?). Where
are you planning on doing this?

    * interpretation layer: reads screen-aligned input positions and
      converts them to gesture events

You cannot easily separate the interpretation layer from the previous two
layers as only the interpretation layer can know whether something is a
"pinch" gesture or just two users happen to move into the same direction with
their fingers. You need something as close to raw data in this layer as

Dividing detection and interpretation into distinct layers looks good on paper
but it becomes hard to do anything useful. OTOH, merging them into one layer
looks good on paper but but it becomes hard to do anything useful. :)


More information about the xorg mailing list