Multitouch followup: gesture recognition?

Peter Hutterer peter.hutterer at
Fri Mar 26 02:58:46 PDT 2010

On Fri, Mar 26, 2010 at 10:02:55AM +0100, Florian Echtler wrote:
> Hello everyone,
> > > >>> Just for my understanding: when talking about a special client, you think of 
> > > >>> something like a (compositing) window manager?
> > > >> Yes, 'special' since it registers itself for rights (and duties) only
> > > >> one client shall possess.
> > > > 
> > > > why? can't the same library be used by multiple clients (thus unifying the
> > > > gesture types) but the clients divide up the available region. This works
> > > > well for button presses, couldn't it work with gestures as well?
> > > Inside one app, there indeed isn't much to it.
> > > 
> > > It depends on what you want long-term. A special client based extension
> > > allows for gestures to be first-level stuff like key presses,
> > > semantically. This allows for e.g. a gesture to switch apps (or other DE
> > > stuff), which you typically don't want to worry about at application level.
> > > 
> > > Also, this way you _know_ gestures are unified, which makes them easier
> > > to learn. Development-wise, I think the approach would allow for best
> > > practices to evolve, which could then be retrofitted into the extension.
> >  
> > I claim that gestures _cannot_ be perfectly unified. 
> > Unify them, but unify them in gtk-gesture, qt-gesture or libgesture. But not
> > in an X extension. A gesture is lossy abstraction of input and what is a
> > zoom gesture in one application is a gesture to move two points
> > apart in another application. Reducing the latter to a ZoomBy10% gesture
> > means to lose the information. So you need to provide both raw data and
> > gesture data anyway.
> On one hand, I agree. But I believe that this problem is exactly what my
> formalism solves. By a) allowing applications to customize gestures and
> b) restricting them to certain screen regions (*), this isn't a
> contradiction anymore. E.g. what's a zoom gesture to the first app on
> one region will split into two separate move gestures for two regions
> for the other one. Of course, the ability to get raw data should always
> be preserved.
> (*) whether these should be XWindows or something different, I don't
> know yet. Is there an extension that allows arbitrary shapes for XWindow
> objects, particularly with respect to input capture?

the Shape extension does that.

> > > Long term, apps could just assume gestures to be around and code against
> > > them.
> > The same is true if the gestures are handled in toolkits like GTK. instead
> > of connecting to button-press you can connect to zoom-in. where the gesture
> > is handled in the end doesn't matter to the application.
> > Qt has QGestureEvents already but still provides QTouchEvents too.
> Correct, but wouldn't it be an advantage to handle it consistently
> across toolkits?

Advantage or disadvantage, it's always difficult to say :)
having a consistent feel across any desktop is something nice, but at the
same time I'd not be surprised if KDE and Gnome would be trying to go for
different feels, not to mention other toolkits and DEs.

> > > Yeah, that way one could knock up a prototype pretty fast.
> One important question here, which I've obviously haven't understood
> fully: what's the difference/dividing line between libXi as a library
> and XInput as an extension? AFAICT, libXi is the implementation for
> XInput - correct?

XInput is a protocol specification. The server talks a networked protocol
with the clients and that can be considered the "API". How a client does it
doesn't matter to the server but most clients use Xlib and more recently
XCB.  libXi is the Xlib part that parses the XInput protocol requests and
events into a real API. so realistically, whenever you add protocol requests
to XInput, you need to add the server implementation and the libXi
implementation to use it.

> Let me try to summarize the possible approaches:
> 1. a pure userspace library which just converts gestures locally for one
> specific client without knowledge of others (this is more or less what
> Qt or libTISCH do right now, though in different ways)
> 2. a special client like a WM which intercepts input events and passes
> additional gesture events to other clients (possible, but with some
> caveats I haven't yet understood fully)
> 3. a separate server extension in its own right (possible, also with
> some potential traps)
> 4. a patch to libXi using the X Generic Event Extension (same as 3, but
> fastest to hack together and doesn't require any changes to the server.)

4 is more like 2.a. where the communication between that client and the
other clients is done through hacks with libXi. you still need the server
code, even if that is just to take the event and forward it on to the next
client without parsing it much.

3 could be either an extension fully handled by the server (gesture
recognition in the server) or as the communication method for the approach


More information about the xorg-devel mailing list