multitouch

Mon Mar 1 07:05:01 PST 2010

On 03/01/2010 03:34 PM, ext Daniel Stone wrote:
> Hi,
>
> On Mon, Mar 01, 2010 at 02:56:57PM +0100, Bradley T. Hughes wrote:
>> On 03/01/2010 01:55 PM, ext Daniel Stone wrote:
>>> I don't really see the conceptual difference between multiple devices
>>> and multiple axes on a single device beyond the ability to potentially
>>> deliver events to multiple windows.  If you need the flexibility that
>>> multiple devices offer you, then just use multiple devices and make your
>>> internal representation look like a single device with multiple axes.
>>
>> This is where the context confusion comes in. How do we know what the
>> user(s) is/are trying to do solely based on a set of x/y/z/w/h
>> coordinates? In some cases, a single device with multiple axes is enough,
>> but in other cases it is not.
>
> Sure.  But in this case you don't get any extra information from having
> multiple separate devices vs. a single device.  The only difference --
> aside from being able to direct events to multiple windows -- is the
> representation.

Correct. However, I think that being able to direct events to multiple 
windows is the main reason we're having this particular discussion. How do 
we do it, given the current state of the art?

>> On a side note, I have a feeling this is why things like the iPhone/iPad
>> are full-screen only, and Windows 7 is single-window multi-touch only.
>
> Rather.
>
> I'm tempted to punt responsibility here by saying:
>    * implement the multi-level device tree as described before
>    * have every touchpoint as a separate device
>    * if you want to grab fingers separately, then shift those devices to
>      a new MD
>
> But that's assuming hardware exists which can do reliable finger
> detection.  Failing that, it's all a bit pointless, so ...

Indeed.

>>> Given that no-one's been able to articulate in much detail what any
>>> other proposed solution should look like or how it will actually work
>>> in the real world, I'm fairly terrified of it.
>>>
>>> Can you guys (Bradley, Peter, Matthew) think of any specific problems
>>> with the multi-layered model? Usecases as above would be great, bonus
>>> points for diagrams. :)
>>
>> I'm concerned about the event routing and implicit grabbing behaviour,
>> specifically. I don't know enough about the internals to really put my
>> concerns into words or link to code in the server.
>>
>> Use-cases? Collaboration is the main use-case. Class rooms, meeting
>> rooms, conferences are ones that I often think about. Think about the
>> GIMP having multi-user and multi-touch support so that art students could
>> work together on a multi-touch table top. I think the MS Surface
>> marketing videos are a good indication of what could be done as well.
>>
>> One thing that we definitely want is for normal button and motion events
>> for one of the active touch-points over a client window. As Peter pointed
>> out, we shouldn't have to rewrite the desktop to support multi-touch. In
>> addition to specialized applications like I described above, we
>> definitely want "normal" applications to remain usable in such an
>> environment (I can definitely see someone bringing up a terminal and/or
>> code editor just for themselves to try out an idea that they get while in
>> a meeting).
>>
>> (Sorry for the lack of diagrams, my ascii-art kung-fu is non-existent.
>> How about a video? http://vimeo.com/4990545)
>
> If the hardware is intelligent enough to be able to pick out different
> fingers, then cool, we can split it all out into separate focii and it's
> quite easy.

I don't think hardware is that intelligent... yet. I forget the name of the 
program (not CCV as far as I know), but there does exist a program that 
implements the TUIO protocol WITH support for object-id's. It can do object 
recognition under special circumstances by looking for and identifying 
infrared reflectors placed on the table's surface (and these reflectors are 
often attached to an object). Programs could then map these object id's to 
something meaningful (object id 5, mapped to "Brad's phone", could sync my 
email, for example). I don't know of anything that tries to identify 
individual fingers, though.

> Failing that, how are we supposed to do it? Say two people have a
> logical button press active (mouse button, finger down, pen down,
> whatever) at once.  Now a third button press comes along ... what do we
> do? Is it a gesture related to one of the two down? If so, which one
> (and which order do we ask them in, etc).  A couple of years ago we
> still could've guessed, but as Qt and GTK are now doing client-side
> windows, it's really hard to even make a _guess_ in the server.

Right, and this was Peter's point... the X server can't know it and 
shouldn't try to guess. What I did in Qt was to deliver the 3rd touch point 
together with its closest neighbor (if the 3rd touch point was not over a 
widget explicitly asking for touch events, that is).

> None of this even really relates to legacy clients or not, it's just
> deciding how to deliver the events to anyone at all in the first place.
> How do you do it -- especially without causing unacceptable latency?
>
> I'm fine with the concepts and whathaveyou, but have you any thoughts on
> how the event delivery flow should look like?

Some, which is to try the ideas that I implemented in Qt and see if and how 
they fit in the X server.

-- 
Bradley T. Hughes (Nokia-D-Qt/Oslo), bradley.hughes at nokia.com
Sandakervn. 116, P.O. Box 4332 Nydalen, 0402 Oslo, Norway