RFC: libei - emulated input in Wayland compositors

Jonas Ådahl jadahl at gmail.com
Fri Jul 31 22:47:53 UTC 2020

On Fri, Jul 31, 2020 at 08:49:41PM +0200, Roman Gilg wrote:
> On Fri, Jul 31, 2020 at 7:13 AM Peter Hutterer <peter.hutterer at who-t.net> wrote:
> >
> > I've been working on a new approach for allowing emulated input devices in
> > Wayland. Or in short - how can we make xdotool and synergy work? And
> > eventually replace them.
> >
> > The proposal I have is a library for Emulated Input, in short libei.
> >   https://gitlab.freedesktop.org/whot/libei/
> We talked about it already yesterday but thanks again for this great
> project. I decided to directly write some experimental integration
> code based on your Weston branch for the server library in KWinFT [1]
> in order to try this out as a solution for my Steam Controller issue
> [2] that - I assume - motivated the creation of this library to some
> extent.
> And yes, it works. :) I can move the cursor with the Steam controller
> as in "Steam client -> XTEST -> patched Xwayland -> libei -> libeis ->
> KWinFT" just fine.
> Am I right in assuming that the button-press event is not yet done in
> libei or in the patched Xwayland version you linked? When it's
> available let me know and I'll add the necessary logic for that too.
> > libei has two parts, the client side (libei) for applications and
> > a server side (libeis) for the compositor. The two libraries communicate
> > with each other (how? doesn't matter, it's an implementation detail) to
> > negotiate input devices.
> >
> > The process is roughly:
> > - the libei client connects and says "I am org.freedesktop.SomeApplication
> >   and I want a pointer and a keyboard device"
> > - the libeis server says "ok, you can have a pointer device and a keyboard
> >   device"
> > - the libei client says 'move the pointer by 1/1', etc. and the server does
> >   just that. or not, depending on context.
> >
> > There are more details, see the README in the repo and the libei.h and
> > libeis.h header files that describe the API.
> >
> > The sticking point here is: emulated input comes via a separate channel.
> > The server a) knows it's emulated input, b) knows who it is coming from and
> > c) has complete control over the input.
> >
> > a) is interesting because you can differ between the events internally. The
> > API right now is very similar to libinput's events so integrating it into a
> > compositor should be trivial.
> >
> > b) is somewhat handwavy if an application runs outside a sandbox - any
> > information will be unreliable. Flatpak gives you an app-id though and
> > with that we can (eventually) do things like storing the allow/deny
> > decisions of the user in the portal implementation.
> >
> > c) allows you to e.g. suspend the client when convenient or just ignore
> > certain sequences altogether. The two made-up examples are: suspend EI
> > during a password prompt, or allow EI from the software yubikey *only*
> > during a password prompt.
> >
> > Now, the next question is: how do they *start* talking to each other?
> > libei provides multiple backends for the initial connection negotiation. My
> > goal is to have this work with flatpak portals so an application running
> > within the sandbox can be restricted accordingly. Alternatives to this could
> > be public DBus interfaces, simple fd passing or (as is implemented right
> > now) a named unix socket.
> Wiring this somehow through portals would be important for sure.
> Xwayland as a client could either be accepted by default or if
> Olivier's Xwayland xdg-portal patches [3] land (with the additional
> portal for libei) only be accepted after the user confirmed it just
> like every other sandboxed client.
> That being said the envisioned permission model is still somewhat
> difficult for me to grasp. To reiterate: the access of sandboxed
> clients can be accepted or rejected by the user. But to my
> understanding that's a function of the xdg-portal itself. You said the
> compositor can filter requests too. Can it only allow libei
> connections through xdg-portals and Xwayland? What about other
> clients, how can they be distinguished from xdg-portals and Xwayland
> securely? Or is this only possible for flatpaked clients? Or is such a
> client blocked from trying to do that anyway (in other words is it
> allowed or not to connect to arbitrary sockets like the libei one)?
> As it is probably clear now the overall concept of xdg-portals in
> detail is still not very well understood by me. From conversations I
> had lately with other windowing system developers I believe I'm not
> the only one.
> Since xdg-portals become more and more important for securing our
> graphical sessions it would be great if someone with more knowledge
> about it could create some kind of article or documentation about it
> that looks at it from the perspective of windowing systems . How do
> apps in/out of Flatpaks that display their pixels through X11,
> Xwayland or Wayland directly work in respect to the sandboxed
> environment provided by xdg-portals? What does this mean for a Wayland
> compositor, what does it need to do or refrain from to be on the safe
> side?
> For example some simple but lucid diagrams like the one in libei's
> README describing the flow around client <-> xdg-portals <-> windowing
> system would probably already help many of us. If somebody feels
> motivated to do that I would be happy to help, ping me on IRC.

I'll make an attempt to try to clarify how things are hooked together
with portals, sandboxed and unsandboxed applications.

The portal implements a few important functions:

1) It exports a set of APIs under org.freedesktop.portal.* that all
sandboxed applications can access.

In contrast to explicitly allowed APIs (i.e. build time configured list
of API to be exposed directly to inside an application sandbox by
default), a portal APIs allows for applications to dynamically request
access to privileged functionality, for example access to arbitrary
file system locations, cameras, geo location, or screen casting.

2) It provides, using backends, methods for implementing user
interactive permission management. For arbitrary file system access,
this may involve e.g. opening a file using a file selection dialog, or
for screen casting this may mean actively choosing what part of your
screen should be shared.

3) It manages remembered access, using a common permission store[0]. For
example if an application was permanently denied camera access, the
portal will know about this and not query the portal backend.

4) It authenticates sandboxed applications. It does this in sandbox
implementation specific ways, e.g. /proc/<pid>/root/.flatpak-info or
AppArmor security labels. Applications themself are not involved with
this, as they per definition cannot be trusted.

5) It provides an abstraction above desktop dependent implementation
details. For example, xdg-desktop-portal-wlroots implements
screenshooting and screen casting using wlr_screencopy_unstable_v1,
while xdg-desktop-portal-gtk implements the same portal API using
org.gnome.Mutter.ScreenCast and org.gnome.Shell.Screenshot.

6) It acts as a "firewall" between sandboxed application and the
system. A portal backend, the piece that implements the interactive
and desktop environment specific functionality, sits completely outside
of the sandbox, and receives already verified requests. The only APIs
that faces sandboxed applications are org.freedesktop.portal.*, while
portal backends implement "hidden" org.freedesktop.impl.portal.* APIs.

A typical flow could look like this:

 * A sandboxed application wants a screenshot, and attempts this by
   calling the appropriate method on the org.freedesktop.org.Screenshot
 * The portal sees this request, checks what application sandbox this
   request came from, verfied the request is not bogus, and then
   forwards the request by calling a method on the portal backend.
 * The portal backend responds to the method call, takes a screenshot,
   e.g.  provides a visual preview of the screenshot, with a button that
   says "Share this to 'Application'" that causes it to return the
   method call with a screenshot.
 * The application then receives the screenshot.

Sometimes, the privileged access is not a one time transferred object,
like a screenhot, or a file, but rather access granted to something over
a longer period of time. Two examples of these are screen casting and
camera support.

These can be implemented in different ways, depending on what piece of
the system the shared resource originates from. Lets take two mentioned
examples, as they behave slightly different here.

The screen casting works by the portal backend implementation in some
backend specific way provides ways to discover, preview, select and
initiate screen casts. The backend provides the portal with information
about what streams are shared screen casts. The portal then takes care
of opening and preparing the PipeWire remote with access to the screen
cast streams, before handing it over to the application.

In the camera portal example, however, the only role the portal backend
implements is the user facing interactive permission granting/denying.
The PipeWire remote is then opened by the portal itself, and prepared in
a way that makes sure the PipeWire daemon has a sandbox aware session

In both of these two, however, PipeWire is a side channel, that was not
established until it was clear that all parts (portal, portal backend
and resource source (e.g. pipewire or compositor)) had agreed upon it.

In the libei/libeis case, it could work very similarly.

Here is a rough diagram of how it could be structured:

                                Sandbox barrier
         System                       ||                 Sandbox
        -----------------             ||
        | Permission    |             ||
        | Store         |             ||
       >|               |<-           ||
      / |               |  \          ||
     /  |               |   \ (p)     ||
    |   -----------------    \        ||
    |                         \       ||
    |                          v      ||
    |                         -----------------      ---------------
    |   -----------------     | Portal proc.  |      |             |
    |   |    Portal     | (2) | deny/grant 8-<| (1)  |             |
    |   |    Backend    |<----|- - -[auth] - -|<-----| Application |
    |   |           - - |-----|- - - - - - - -|----->|             |
    |   | deny/ 8-</    | (4) |       ..      | (5)  |             |
    |   | grant   v     |     |       ..      |      |             |
    |   -----------------     -----------------      |             |
    |           ^                     ||             |             |
    | (p')      | (3)                 ||             |             |
     \          V                     ||             |             |
      \ -----------------     (C)   ______           |. . . .      |
       >|Compositor .lib\__________/ ____ \__________/      .      |
        |           .eis ___________/ || \___________ libei .      |
        |           . . /             ||             \      .      |
        |               |             ||             |. . . .      |
        |               |             ||             |             |
        -----------------             ||             _______________

In the diagram you can see 5 "interactions" between different components
that takes place, (1), (2), (3), (4) and (5), resulting in the side
channel (C).

(1) is the application making a request to be able to inject input

(2) is the portal, having authenticated the application the request came
from and verified the metadata, does a method call on behalf of the
application to the portal backend implementation to check with the user
then maybe open an Ei session.

(3) is the portal backend, possibly having queried the user about
permissions, opening a new Ei session.

(4) is the backend returning from the method call done in (2), possibly
with an open file descriptor to a Eis session.

(5) the portal responds to the requests made from the application in
step (1). This response is then used to establish (C). See below.

(C) is the newly established Eis channel where the application can
inject input using libei, and the compositor being able to process
injected input in a similar way to how it processes libinput events.

(p) and (p') corresponds to permission store interaction (where p' is
optional but recommended). Either up front by the portal, or e.g. a
compositor may be permission store aware, so that it can terminate a
session that changes permission after access was originally granted.
This is for example how PipeWire handles camera access being revoked.

Today, unsandboxed applications are treated on a per case basis. In some
cases (e.g. screen casts, screenshots), the only effect is that the name
of the application is not presented as part of a dialog, while in other
cases it simply defaults to some policy e.g. deny or grant.

The permission store itself doesn't require entries to have a sandboxed
application ID, but the portal itself currently doesn't have a way to
identify an application that isn't running inside a sandbox. Thus, if a
Ei session policy default is to ask, for a portal to be able to remember
the permission for an unsandboxed application, we would have to add a
way for the portal to blindly trust some ID or key of some sort the
application provides.

Xwayland would be considered an unsandboxed application, even if the
application a XTEST request came from was sandboxed. Exactly how to deal
with this, is an open question, e.g. whether to treat all "X11"
applications as one giant blob, or whether to distinguish between them,
depends on in what ways we would be able to let the portal trust
metadata coming from unsandboxed applications.


[0] https://github.com/flatpak/xdg-desktop-portal/wiki/The-Permission-Store
> [1] https://gitlab.com/romangg/kwinft/-/commits/libei
> [2] https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/431
> [3] https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/465
> > The aim is that a client can simply iterate through all of the options until
> > finds a connection. Once that's found, the actual code for emulating input is
> > always the same so it's trivial to implement a client that works on any
> > compositor that supports some backend of libeis.
> > The server part only needs to care about the negotiation mechanisms it
> > allows, i.e. GNOME will only have dbus/portal, sway will only have... dunno,
> > fd exchange maybe?
> >
> > Next: because we have a separate channel for emulated input we can hook up
> > XTEST to use libei to talk to a compositor. I have a PoC implementation for
> > weston and Xwayland:
> >   https://gitlab.freedesktop.org/whot/weston/-/commits/wip/eis
> >   https://gitlab.freedesktop.org/whot/xserver/-/commits/wip/xwayland-eis
> > With that xdotool can move the pointer. Note this is truly the most minimal
> > code just to illustrate the point but you can fill in the blanks and do
> > things like the compositor preventing XTEST or not, etc.
> >
> > This is all in very early stages with very little error checking so things
> > will probably crash or disconnect unexpectedly. I've tried to document the
> > API to make the intentions clear but there are still some very handwavy
> > bits.
> >
> > Do let me know if you have any questions or suggestions please though.
> >
> > Cheers,
> >   Peter
> > _______________________________________________
> > wayland-devel mailing list
> > wayland-devel at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/wayland-devel
> _______________________________________________
> wayland-devel mailing list
> wayland-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/wayland-devel

More information about the wayland-devel mailing list