[RFC] Interface for injection of input events

Wed Mar 22 02:23:46 UTC 2017

Hi all,

This is an RFC for a new interface to generate input events from arbitrary
clients. Note that this is only a few days old, so **do not** assume this is
anything more a thought experiment right now. This email is supposed to start a
discussion and collection of the various points that need to be addressed.

First: why? There are some commandline tools like xdotool that allow for some
scripting of desktop interactions. xdotool supports two categories: input
device emulation and window management.

This RFC primarily addresses the input device emulation bits but there is
room for adding window management capabilities. I have a basic repo here:
http://github.com/whot/woodotool but it doesn't contain much beyond what's
in this email.

This will be a discussion of the interface only because the implementations
are so specific that there is no real code-sharing beyond the interface
itself. I've talked privately to some of you already, the general mood is
somewhere around reluctant acceptance.

So here's a list of talking points:

== DBus ==
What we need is basic IPC and not directly Wayland related, DBus provides a
bunch of extras over the wayland protocol: introspection, ease of
extensibility, bindings, etc. Also, Mir.

== Functionality-based interfaces ==
We need a mix of capabilities and features, not all of which will/should be
available to all clients. Right now, I have two for devices:
 org.freedesktop.WoodoTool.Keyboard (Press, Release)
 org.freedesktop.WoodoTool.Mouse (Press, Release, MoveRelative, MoveAbsolute)
Compositors can implement one, both, either, etc. For future extensions,
having a Touch interface, Joystick, or whatever is deemed useful is
technically trivial.

There's a manager interface too but that's a technical detail, see the repo
for more details.

== Control of the event stream ==
The events are coming in through a custom interface, so it's relatively
trivial to ignore events based on context, e.g. ignore fake key events while
the screen is locked. Any uinput-based solution would lack this context.

== Authentication/Identification ==
The goal is to filter clients based on some white/blacklist, so that e.g.
xdotool can access this interface but others cannot.

This is a big ¯\_(ツ)_/¯ for now, I don't now how to do this reliably.
It's trivial to do per user, but per-process is difficult. DBus filters
are largely limited to per-users. It's possible to get the process ID of a
sender but going beyond that is unreliable (kernel doesn't guarantee comm
being accurate).

Requiring applications to bind to a bus name merely restricts them to being
a singleton, there is no guarantee the application that binds
org.freedesktop.org.WoodoTool.auth.xdotool is actually xdotool.

The option that comes closest so far is some pre-shared key between
compositor and application. That would need to be worked into the API, but
it also relies on all participants to keep the key encrypted in memory and
the various configuration files.

So it's not clear whether we can do anything beyond a basic on/off toggle on
whether to allow events from fake input devices. Debatable if such a crude
mechanism is useful.

Either way, this is a problem that *must* be solved but not necessarily one
that affects the API itself (beyond what is required to make it
technically feasable, e.g. passing cookies around)

== Isolation of devices ==
Compositors should create separate virtual input devices for each client so
one client can't mess with the state of another one or even detect if
there's another one active. Plus we get to re-use all the existing code that
merge state from different (physical) devices. This just makes the actual
device handling implementation trivial.

It gets a bit more difficult if we want to have this per-seat though. Seat
is a wayland concept so we could fold that in, but then you're leaking the
seat configuration. So I'm not 100% sure we even want this, or whether we
should just make sure it can be retrofitted in the future.

== Use for testing ==
Came up several times, I'm not yet convinced this is a good compositor testing
interface beyond basic input. A good testing interface likely requires
something more compositor-specific where more of the window state is made
available. But it could work as an interim solution.

== Input coordinate handling ==
Keys and mouse buttons are trivial unless we want custom focus (we don't).
Relative coordinates are ok too, absolute ones are trickier because they
rely on screen layout not available to the client.

So the specification needs to include a behaviour we can stick to forever,
something like "in pixels as measured from the logical top-left of the
top-left-most screen" etc. Not difficult per se, but this stuff is usually
prone to corner-case unhappiness.

== Input coordinate filtering ==
One use-case that was mentioned to me was effectively "intercept button X
and send key events 'abc' instead". This is not possible with the current
proposal and I'm not sure it can be without going overboard with
specifications. It *may* be possible to provide some global hotkey hooks I
have not come up with a good solution to that.

== Window management ==
This is beyond the current suggestion but also where it gets both
interesting and difficult.

I have a sample org.freedesktop.WoodoTool.Desktop interface that would send
Edge signals to clients. The compositor decides when these are triggered,
the client can react to these things with custom commmands.

But this is where we get into the proper scripting territory and that's also
where the opinions will diverge quite quickly.

For example, xdotool supports things like "search for window with name 'foo'
and activate it". Implementing this is ... tricky. Now we need some
definition of what a window classifies as and how windows are sorted within
a process - compare multiple GIMP windows with multiple gnome-terminal
windows for example. Not specifying exactly what order we return leaves us
open to behavioural dependencies which may break in the future.

In other words, an interface to search for windows of a given application
is technically feasable but extremly hard to get right.

Anyway, with the separation of interfaces this is not something we need in
the first iterations. But one legitimate question is whether just an
implementation for virtual input devices is sufficient or whether it's
largely pointless without any additional window-management capabilities.

== Implementation ==
Because of what it does, there is no real code-sharing between compositors -
they would just use their own intrastructure to hook to the dbus interface
and create virtual devices. On the client-side it's much the same thing,
binding to dbus is trivial.

So 99% of the work here would be to define the interface and get everyone to
agree on it.

Any comments or suggestions?

Cheers,
  Peter