[RFC] Interface for injection of input events

Sat Mar 25 05:49:17 UTC 2017

Reply below, focusing on use cases for now

On Wed, Mar 22, 2017 at 4:59 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
>
> On Wed, 22 Mar 2017 12:23:46 +1000
> Peter Hutterer <peter.hutterer at who-t.net> wrote:
>
> > Hi all,
> >
> > This is an RFC for a new interface to generate input events from
arbitrary
> > clients. Note that this is only a few days old, so **do not** assume
this is
> > anything more a thought experiment right now. This email is supposed to
start a
> > discussion and collection of the various points that need to be
addressed.
> >
> > First: why? There are some commandline tools like xdotool that allow
for some
> > scripting of desktop interactions. xdotool supports two categories:
input
> > device emulation and window management.
> >
> > This RFC primarily addresses the input device emulation bits but there
is
> > room for adding window management capabilities. I have a basic repo
here:
> > http://github.com/whot/woodotool but it doesn't contain much beyond
what's
> > in this email.
> >
> > This will be a discussion of the interface only because the
implementations
> > are so specific that there is no real code-sharing beyond the interface
> > itself. I've talked privately to some of you already, the general mood
is
> > somewhere around reluctant acceptance.
> >
> > So here's a list of talking points:
> >
> > == DBus ==
> > What we need is basic IPC and not directly Wayland related, DBus
provides a
> > bunch of extras over the wayland protocol: introspection, ease of
> > extensibility, bindings, etc. Also, Mir.
> >
> > == Functionality-based interfaces ==
> > We need a mix of capabilities and features, not all of which
will/should be
> > available to all clients. Right now, I have two for devices:
> >  org.freedesktop.WoodoTool.Keyboard (Press, Release)
> >  org.freedesktop.WoodoTool.Mouse (Press, Release, MoveRelative,
MoveAbsolute)
> > Compositors can implement one, both, either, etc. For future extensions,
> > having a Touch interface, Joystick, or whatever is deemed useful is
> > technically trivial.
> >
> > There's a manager interface too but that's a technical detail, see the
repo
> > for more details.
> >
> > == Control of the event stream ==
> > The events are coming in through a custom interface, so it's relatively
> > trivial to ignore events based on context, e.g. ignore fake key events
while
> > the screen is locked. Any uinput-based solution would lack this context.
> >
> > == Authentication/Identification ==
> > The goal is to filter clients based on some white/blacklist, so that
e.g.
> > xdotool can access this interface but others cannot.
>
> Hi,
>
> if one allows a generic tool that essentially exposes everything at
> will, there isn't much point in authenticating that program, because
> any other program can simply call it.
>
> > This is a big ¯\_(ツ)_/¯ for now, I don't now how to do this reliably.
> > It's trivial to do per user, but per-process is difficult. DBus filters
> > are largely limited to per-users. It's possible to get the process ID
of a
> > sender but going beyond that is unreliable (kernel doesn't guarantee
comm
> > being accurate).
> >
> > Requiring applications to bind to a bus name merely restricts them to
being
> > a singleton, there is no guarantee the application that binds
> > org.freedesktop.org.WoodoTool.auth.xdotool is actually xdotool.
> >
> > The option that comes closest so far is some pre-shared key between
> > compositor and application. That would need to be worked into the API,
but
> > it also relies on all participants to keep the key encrypted in memory
and
> > the various configuration files.
> >
> > So it's not clear whether we can do anything beyond a basic on/off
toggle on
> > whether to allow events from fake input devices. Debatable if such a
crude
> > mechanism is useful.
> >
> >
> > Either way, this is a problem that *must* be solved but not necessarily
one
> > that affects the API itself (beyond what is required to make it
> > technically feasable, e.g. passing cookies around)
>
> It's essentially the same problem we have with all the privileged
> Wayland interfaces, too.
>
> Containers or sandboxing have been mentioned as a possible way to let
> the OS reliably identify the running program.
>
>
> > == Isolation of devices ==
> > Compositors should create separate virtual input devices for each
client so
> > one client can't mess with the state of another one or even detect if
> > there's another one active. Plus we get to re-use all the existing code
that
> > merge state from different (physical) devices. This just makes the
actual
> > device handling implementation trivial.
> >
> > It gets a bit more difficult if we want to have this per-seat though.
Seat
> > is a wayland concept so we could fold that in, but then you're leaking
the
> > seat configuration. So I'm not 100% sure we even want this, or whether
we
> > should just make sure it can be retrofitted in the future.
>
> One possibility is to always create at least one new virtual seat
> (wl_seat) for the emulated input devices. Woodotool clients would see
> the virtual seat and possibly be allowed to create and destroy more
> virtual seats at will. This would not leak or interfere with the real
> seats.
>
> At the same time it would also exclude use cases where one wants to
> hook into the real seat, e.g. a program that provides keyboard events
> but relies on the real seat for focus handling.
>
> There can be multiple real seats. How would one pick which real seat to
> integrate with?
>
> Hmm, wait, what do mean by "leaking the seat configuration"? Don't we
> expose exactly that straight through Wayland? What's the issue you see?
>
> > == Use for testing ==
> > Came up several times, I'm not yet convinced this is a good compositor
testing
> > interface beyond basic input. A good testing interface likely requires
> > something more compositor-specific where more of the window state is
made
> > available. But it could work as an interim solution.
> >
> > == Input coordinate handling ==
> > Keys and mouse buttons are trivial unless we want custom focus (we
don't).
> > Relative coordinates are ok too, absolute ones are trickier because they
> > rely on screen layout not available to the client.
> >
> > So the specification needs to include a behaviour we can stick to
forever,
> > something like "in pixels as measured from the logical top-left of the
> > top-left-most screen" etc. Not difficult per se, but this stuff is
usually
> > prone to corner-case unhappiness.
>
> Logical or physical pixels?
>
> Anyway, I think it's ok and even necessary to have it working at all,
> even though it will exclude some possible exotic Wayland compositor
> designs.
>
> > == Input coordinate filtering ==
> > One use-case that was mentioned to me was effectively "intercept button
X
> > and send key events 'abc' instead". This is not possible with the
current
> > proposal and I'm not sure it can be without going overboard with
> > specifications. It *may* be possible to provide some global hotkey
hooks I
> > have not come up with a good solution to that.
>
> A global hotkey system has been discussed in the past. IIRC one of the
> better ideas was something along the lines of (Wayland) clients
> registering "actions" with default "input event" bindings but keeping
> the control in the compositor. The binding would not work by default for
> most input events, except maybe for some whitelisted things like
> Play/Pause keys, and would require the user to accept the binding. The
> user would also have the opportunity to bind the actions to any other
> input events he wishes.
>
> Again such a "grant permissions" scheme must rely on some form of
> persistent application identification, which quickly moves us towards
> containers again if we want to be resistant towards clients
> deliberately lying what they are.
>
> However, IMHO such button/key etc. remapping is something that should
> be done completely in the compositor (if not possible in keymaps) to
> make it reliable and fast, unless perhaps when the original input
> device is not one that the compositor is already using, like a joystick
> or a gamepad. Or a smart phone acting as a touchpad/keyboard for remote
> control of a computer.
>
>
> > == Window management ==
> > This is beyond the current suggestion but also where it gets both
> > interesting and difficult.
> >
> > I have a sample org.freedesktop.WoodoTool.Desktop interface that would
send
> > Edge signals to clients. The compositor decides when these are
triggered,
> > the client can react to these things with custom commmands.
> >
> > But this is where we get into the proper scripting territory and that's
also
> > where the opinions will diverge quite quickly.
> >
> > For example, xdotool supports things like "search for window with name
'foo'
> > and activate it". Implementing this is ... tricky. Now we need some
> > definition of what a window classifies as and how windows are sorted
within
> > a process - compare multiple GIMP windows with multiple gnome-terminal
> > windows for example. Not specifying exactly what order we return leaves
us
> > open to behavioural dependencies which may break in the future.
> >
> > In other words, an interface to search for windows of a given
application
> > is technically feasable but extremly hard to get right.
> >
> > Anyway, with the separation of interfaces this is not something we need
in
> > the first iterations. But one legitimate question is whether just an
> > implementation for virtual input devices is sufficient or whether it's
> > largely pointless without any additional window-management capabilities.
> >
> > == Implementation ==
> > Because of what it does, there is no real code-sharing between
compositors -
> > they would just use their own intrastructure to hook to the dbus
interface
> > and create virtual devices. On the client-side it's much the same thing,
> > binding to dbus is trivial.
> >
> >
> > So 99% of the work here would be to define the interface and get
everyone to
> > agree on it.
>
> My main concern still is that we don't seem to have any intended use
> cases written down, so it's hard to say if this is enough or not.
>

Context: I am the maintainer of xdotool.

I can try to provide a more concrete list of use cases, if that helps? In
no particular order and not necessarily involving xdotool:

* Typing text (KeePass invokes xdotool for this purpose)
* Keyboarding input with an on-screen keyboard.
* GUI automation (mouse/typing/input/windowing).
* Key macros - having one hotkey send a specific key sequence (It sounds
weird, but this is a common use for xdotool)
* Controlling the cursor with the keyboard (keynav does this, I am also its
author)
* Scripted window pager activity (switch desktop, move window to a new
desktop, list windows)
* Window manager actions (close window, resize window, maximize, etc).

If you'd like feedback on certain areas or want me to give interface ideas,
I'm open to that, but I wanted to focus this particular email on just use
cases.

For example, I'll talk about typing (XTEST, etc). I would love an interface
worked at a higher level than raw key codes, or at least that possibility
could be explored. The reason for this request is that xdotool dedicates a
bunch of code to hopping between key symbols, key codes, and key
character/glyph representation, and sometimes gets this wrong. For example,
to type the letter "t" xdotool has to query the whole keyboard mapping to
find the keycode mapped to this symbol, and if none is found, it makes one
(and then removes it after). We also cannot, under X11 and to my knowledge,
type emoji or other complex unicode entities using the XTEST api. xdotool
also often needs users to provide the --clear-modifiers flag to reset any
active keyboard state before typing, and it'd be nice not to need that
juggling.

>
> Testing was kind of discarded as needing something more powerful.
> Converting one input event to another was mostly scoped out, apart from
> perhaps converting gamepad buttons to keyboard/pointer events for
> instance.
>
> What *are* the intended use cases for this interface?
>
> A very big problem that I see is how to direct input:
>
> - If you send keyboard events, how does the program or the user
>   control where they go?

This is a good question. With X11 (and apologies if this is already known
information), we have XTEST to simulate general input events and XSendEvent
(which sometimes is blocked by apps) to send events directly to a specific
window. I would love something somehow better (and certainly less of an
attack surface!) than XSendEvent/XTEST :)

>
>
> - If you send pointer events, how do you determine which coordinates to
>   send?

This is a good question. With the UI automation use case, I see folks
warping the pointer to specific pixel locations. It's unclear how this
would be achieved with a virtual absolute-coordinate touchpad without
knowing the pixel dimensions of each screen.

>
> If you use a gamepad stick to control a pointer, you can just send
> relative motion events and the user will be happy because he is
> interactively controlling the pointer. But is this the only use case
> the interface can support? Is it enough?

For gamepad prior art, there's some interesting things that you can do with
Valve's Steam Controller in terms of making the gamepad inputs do a pretty
wide variety of things with respect to input. I don't know how common this
use case (gamepad/joystick for pointer and keyboard input), but I'm pretty
sure X11 has support for gamepad-as-a-pointer, for example.

Regards,
-Jordan

>
> Would remote desktop control be in scope for this interface? E.g.
> letting IT support temporarily take control of your desktop through
> their own support software you have installed?
>
>
> Thanks,
> pq
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20170324/04b13e59/attachment-0001.html>