RFC: libei - emulated input in Wayland compositors

Sat Aug 1 11:42:16 UTC 2020

On Sat, Aug 1, 2020 at 12:47 AM Jonas Ådahl <jadahl at gmail.com> wrote:
>
> On Fri, Jul 31, 2020 at 08:49:41PM +0200, Roman Gilg wrote:
> > On Fri, Jul 31, 2020 at 7:13 AM Peter Hutterer <peter.hutterer at who-t.net> wrote:
> > >
> > > I've been working on a new approach for allowing emulated input devices in
> > > Wayland. Or in short - how can we make xdotool and synergy work? And
> > > eventually replace them.
> > >
> > > The proposal I have is a library for Emulated Input, in short libei.
> > >   https://gitlab.freedesktop.org/whot/libei/
> >
> > We talked about it already yesterday but thanks again for this great
> > project. I decided to directly write some experimental integration
> > code based on your Weston branch for the server library in KWinFT [1]
> > in order to try this out as a solution for my Steam Controller issue
> > [2] that - I assume - motivated the creation of this library to some
> > extent.
> >
> > And yes, it works. :) I can move the cursor with the Steam controller
> > as in "Steam client -> XTEST -> patched Xwayland -> libei -> libeis ->
> > KWinFT" just fine.
> >
> > Am I right in assuming that the button-press event is not yet done in
> > libei or in the patched Xwayland version you linked? When it's
> > available let me know and I'll add the necessary logic for that too.
> >
> > > libei has two parts, the client side (libei) for applications and
> > > a server side (libeis) for the compositor. The two libraries communicate
> > > with each other (how? doesn't matter, it's an implementation detail) to
> > > negotiate input devices.
> > >
> > > The process is roughly:
> > > - the libei client connects and says "I am org.freedesktop.SomeApplication
> > >   and I want a pointer and a keyboard device"
> > > - the libeis server says "ok, you can have a pointer device and a keyboard
> > >   device"
> > > - the libei client says 'move the pointer by 1/1', etc. and the server does
> > >   just that. or not, depending on context.
> > >
> > > There are more details, see the README in the repo and the libei.h and
> > > libeis.h header files that describe the API.
> > >
> > > The sticking point here is: emulated input comes via a separate channel.
> > > The server a) knows it's emulated input, b) knows who it is coming from and
> > > c) has complete control over the input.
> > >
> > > a) is interesting because you can differ between the events internally. The
> > > API right now is very similar to libinput's events so integrating it into a
> > > compositor should be trivial.
> > >
> > > b) is somewhat handwavy if an application runs outside a sandbox - any
> > > information will be unreliable. Flatpak gives you an app-id though and
> > > with that we can (eventually) do things like storing the allow/deny
> > > decisions of the user in the portal implementation.
> > >
> > > c) allows you to e.g. suspend the client when convenient or just ignore
> > > certain sequences altogether. The two made-up examples are: suspend EI
> > > during a password prompt, or allow EI from the software yubikey *only*
> > > during a password prompt.
> > >
> > > Now, the next question is: how do they *start* talking to each other?
> > > libei provides multiple backends for the initial connection negotiation. My
> > > goal is to have this work with flatpak portals so an application running
> > > within the sandbox can be restricted accordingly. Alternatives to this could
> > > be public DBus interfaces, simple fd passing or (as is implemented right
> > > now) a named unix socket.
> >
> > Wiring this somehow through portals would be important for sure.
> > Xwayland as a client could either be accepted by default or if
> > Olivier's Xwayland xdg-portal patches [3] land (with the additional
> > portal for libei) only be accepted after the user confirmed it just
> > like every other sandboxed client.
> >
> > That being said the envisioned permission model is still somewhat
> > difficult for me to grasp. To reiterate: the access of sandboxed
> > clients can be accepted or rejected by the user. But to my
> > understanding that's a function of the xdg-portal itself. You said the
> > compositor can filter requests too. Can it only allow libei
> > connections through xdg-portals and Xwayland? What about other
> > clients, how can they be distinguished from xdg-portals and Xwayland
> > securely? Or is this only possible for flatpaked clients? Or is such a
> > client blocked from trying to do that anyway (in other words is it
> > allowed or not to connect to arbitrary sockets like the libei one)?
> >
> > As it is probably clear now the overall concept of xdg-portals in
> > detail is still not very well understood by me. From conversations I
> > had lately with other windowing system developers I believe I'm not
> > the only one.
> >
> > Since xdg-portals become more and more important for securing our
> > graphical sessions it would be great if someone with more knowledge
> > about it could create some kind of article or documentation about it
> > that looks at it from the perspective of windowing systems . How do
> > apps in/out of Flatpaks that display their pixels through X11,
> > Xwayland or Wayland directly work in respect to the sandboxed
> > environment provided by xdg-portals? What does this mean for a Wayland
> > compositor, what does it need to do or refrain from to be on the safe
> > side?
> >
> > For example some simple but lucid diagrams like the one in libei's
> > README describing the flow around client <-> xdg-portals <-> windowing
> > system would probably already help many of us. If somebody feels
> > motivated to do that I would be happy to help, ping me on IRC.
>
> I'll make an attempt to try to clarify how things are hooked together
> with portals, sandboxed and unsandboxed applications.

Hi Jonas,

I just finished reading your mail and that's an incredible helpful,
comprehensive and well written introduction to the topic. Thanks so
much for this! Also obviously loving the diagram-art. :)

> The portal implements a few important functions:
>
> 1) It exports a set of APIs under org.freedesktop.portal.* that all
> sandboxed applications can access.
>
> In contrast to explicitly allowed APIs (i.e. build time configured list
> of API to be exposed directly to inside an application sandbox by
> default), a portal APIs allows for applications to dynamically request
> access to privileged functionality, for example access to arbitrary
> file system locations, cameras, geo location, or screen casting.

That's quite a long sentence. Let me dissect it a bit. What kind of
API are we talking about? The xdg-portal-desktop one? Who defines the
"build time configured list of API"? Are we talking about the "build"
of the Flatpak? Is it the Manifest's "finish-args" as described this
way in the Flatpak documentation? [1]

> 2) It provides, using backends, methods for implementing user
> interactive permission management. For arbitrary file system access,
> this may involve e.g. opening a file using a file selection dialog, or
> for screen casting this may mean actively choosing what part of your
> screen should be shared.

Right, what the basic idea of the portals is, is pretty clear to me.
Basically it pipes requests of sandboxed apps through an
authentication and permission system in case they want to interact
with outside their sandbox. It's more about the cases where they
don't. A Wayland client in a sandbox interacts via Wayland protocol
with the outside of their sandbox, but that's not going through
xdg-portals, or is there a permission to block it from that?

> 3) It manages remembered access, using a common permission store[0]. For
> example if an application was permanently denied camera access, the
> portal will know about this and not query the portal backend.

Yea, I quickly did an online search for a GUI to it and found [2]
which looks awesome. Also answers my question above: there exist
permissions to block a sandboxed app from Wayland or X11. But I assume
a user won't be asked for every new GUI app he installs if it is
allowed to show some pixels and instead the permission is by default
set to allowed. Looking back at it the GUI app should set this as a
permission in their Manifest file, which is like a default on install.

> 4) It authenticates sandboxed applications. It does this in sandbox
> implementation specific ways, e.g. /proc/<pid>/root/.flatpak-info or
> AppArmor security labels. Applications themself are not involved with
> this, as they per definition cannot be trusted.
>
> 5) It provides an abstraction above desktop dependent implementation
> details. For example, xdg-desktop-portal-wlroots implements
> screenshooting and screen casting using wlr_screencopy_unstable_v1,
> while xdg-desktop-portal-gtk implements the same portal API using
> org.gnome.Mutter.ScreenCast and org.gnome.Shell.Screenshot.

As a side note you meant xdg-desktop-portal-wlr.

In regards to screen casting you use dbus for the communication
between portal backend and Mutter. A sandboxed client normally won't
have access to these dbus interfaces, but it has access to
org.freedesktop.desktop, right?

On the other side wlroots and KDE have nowadays Wayland protocols for
screen casting. Lead these in your opinion a priori to holes in the
sandbox if a sandboxed client has the permission to use the Wayland
protocol?

> 6) It acts as a "firewall" between sandboxed application and the
> system. A portal backend, the piece that implements the interactive
> and desktop environment specific functionality, sits completely outside
> of the sandbox, and receives already verified requests. The only APIs
> that faces sandboxed applications are org.freedesktop.portal.*, while
> portal backends implement "hidden" org.freedesktop.impl.portal.* APIs.
>
> A typical flow could look like this:
>
>  * A sandboxed application wants a screenshot, and attempts this by
>    calling the appropriate method on the org.freedesktop.org.Screenshot
>    API.
>  * The portal sees this request, checks what application sandbox this
>    request came from, verfied the request is not bogus, and then
>    forwards the request by calling a method on the portal backend.
>  * The portal backend responds to the method call, takes a screenshot,
>    e.g.  provides a visual preview of the screenshot, with a button that
>    says "Share this to 'Application'" that causes it to return the
>    method call with a screenshot.
>  * The application then receives the screenshot.
>
> Sometimes, the privileged access is not a one time transferred object,
> like a screenhot, or a file, but rather access granted to something over
> a longer period of time. Two examples of these are screen casting and
> camera support.
>
> These can be implemented in different ways, depending on what piece of
> the system the shared resource originates from. Lets take two mentioned
> examples, as they behave slightly different here.
>
> The screen casting works by the portal backend implementation in some
> backend specific way provides ways to discover, preview, select and
> initiate screen casts. The backend provides the portal with information
> about what streams are shared screen casts. The portal then takes care
> of opening and preparing the PipeWire remote with access to the screen
> cast streams, before handing it over to the application.
>
> In the camera portal example, however, the only role the portal backend
> implements is the user facing interactive permission granting/denying.

I thought the permission system is done by the frontend. Or do you
mean with this just displaying some UI when no permission was set yet,
while permission checking/storing is still done by the frontend?

> The PipeWire remote is then opened by the portal itself, and prepared in
> a way that makes sure the PipeWire daemon has a sandbox aware session
> manager.

It doesn't need more from the backend because PipeWire itself talks to
the camera?

> In both of these two, however, PipeWire is a side channel, that was not
> established until it was clear that all parts (portal, portal backend
> and resource source (e.g. pipewire or compositor)) had agreed upon it.
>
> In the libei/libeis case, it could work very similarly.
>
> Here is a rough diagram of how it could be structured:
>
>                                 Sandbox barrier
>          System                       ||                 Sandbox
>                                       ||
>         -----------------             ||
>         | Permission    |             ||
>         | Store         |             ||
>        >|               |<-           ||
>       / |               |  \          ||
>      /  |               |   \ (p)     ||
>     |   -----------------    \        ||
>     |                         \       ||
>     |                          v      ||
>     |                         -----------------      ---------------
>     |   -----------------     | Portal proc.  |      |             |
>     |   |    Portal     | (2) | deny/grant 8-<| (1)  |             |
>     |   |    Backend    |<----|- - -[auth] - -|<-----| Application |
>     |   |           - - |-----|- - - - - - - -|----->|             |
>     |   | deny/ 8-</    | (4) |       ..      | (5)  |             |
>     |   | grant   v     |     |       ..      |      |             |
>     |   -----------------     -----------------      |             |
>     |           ^                     ||             |             |
>     | (p')      | (3)                 ||             |             |
>      \          V                     ||             |             |
>       \ -----------------     (C)   ______           |. . . .      |
>        >|Compositor .lib\__________/ ____ \__________/      .      |
>         |           .eis ___________/ || \___________ libei .      |
>         |           . . /             ||             \      .      |
>         |               |             ||             |. . . .      |
>         |               |             ||             |             |
>         -----------------             ||             _______________
>                                       ||
>                                       ||
>

Love the diagram.

>
> In the diagram you can see 5 "interactions" between different components
> that takes place, (1), (2), (3), (4) and (5), resulting in the side
> channel (C).
>
> (1) is the application making a request to be able to inject input
> events.
>
> (2) is the portal, having authenticated the application the request came
> from and verified the metadata, does a method call on behalf of the
> application to the portal backend implementation to check with the user
> then maybe open an Ei session.
>
> (3) is the portal backend, possibly having queried the user about
> permissions, opening a new Ei session.
>
> (4) is the backend returning from the method call done in (2), possibly
> with an open file descriptor to a Eis session.

The portal backend creates the open file descriptor, something the
sandboxed app can't do per se. In general a sandboxed app can not
create arbitrary sockets, correct?

> (5) the portal responds to the requests made from the application in
> step (1). This response is then used to establish (C). See below.
>
> (C) is the newly established Eis channel where the application can
> inject input using libei, and the compositor being able to process
> injected input in a similar way to how it processes libinput events.
>
> (p) and (p') corresponds to permission store interaction (where p' is
> optional but recommended). Either up front by the portal, or e.g. a
> compositor may be permission store aware, so that it can terminate a
> session that changes permission after access was originally granted.
> This is for example how PipeWire handles camera access being revoked.

What do you mean by "terminate a session that changes permission"? Who
changed the permission in the first place? The user? For example in
the Flatseal UI by switching one app off again?

> Today, unsandboxed applications are treated on a per case basis. In some
> cases (e.g. screen casts, screenshots), the only effect is that the name
> of the application is not presented as part of a dialog, while in other
> cases it simply defaults to some policy e.g. deny or grant.

Do you mean non-sandboxed applications that still try to communicate
with xdg-desktop-portal? They could also just talk directly to your
Mutter dbus interface or does it only allow connections from
xdg-desktop-portal's dbus "address"? If yes, this dbus
address/name/identifier is not fakable?

> The permission store itself doesn't require entries to have a sandboxed
> application ID, but the portal itself currently doesn't have a way to
> identify an application that isn't running inside a sandbox. Thus, if a
> Ei session policy default is to ask, for a portal to be able to remember
> the permission for an unsandboxed application, we would have to add a
> way for the portal to blindly trust some ID or key of some sort the
> application provides.

What about the portal in this case being able to get additional
information from other "trusted actors" like the Wayland server? In
case of Xwayland the compositor could hand over a secret key at the
start that Xwayland then can provide for queries to xdg-desktop-portal
and then check back for every request with the compositor if the app
that states to be "Xwayland" provided indeed this key.

>
> Xwayland would be considered an unsandboxed application, even if the
> application a XTEST request came from was sandboxed. Exactly how to deal
> with this, is an open question, e.g. whether to treat all "X11"
> applications as one giant blob, or whether to distinguish between them,
> depends on in what ways we would be able to let the portal trust
> metadata coming from unsandboxed applications.
>

>From my understanding there are basically three important client classes:
* Unsandboxed app
* Sandboxed Wayland app
* Sandboxed X11 app

Some random unsandboxed app can do whatever it wants but some system
interfaces might be only accessible by xdg-portals like certain dbus
addresses for example screencasting in Mutter (if that's not true tell
me) and reading Peter's first email also libei is supposed to behave
this way in a Gnome session. The question is in this case if such apps
should be able to gain access to the system interfaces that are
accessible only by xdg-portals through its permission system.

Sandboxed Wayland apps are properly secure as long as there are no
holes created by not secure Wayland protocol extensions.

Sandboxed X11 apps are as secure as unsandboxed Xwayland is contained
from the outside system since the X11 protocol design is per se not
secure. So if Xwayland is not contained properly and has access to
certain critical/global interfaces this could then be used as an
attack vector from inside the sandbox to the outside. As an example a
rogue sandboxed Xwayland app could fake input if Xwayland was allowed
in general access to libei (that also holds for unsandboxed X11 apps,
but for sandboxed ones it's more grave). On the other side if the
access is asked for by Xwayland on a per-Xwayland-client-basis with
Xwayland checking window name, PID (?) or something similar a rogue
client could still try to gain access by posing as a different client
that was already allowed for example by changing its window title and
maybe PID (would this at all be doable for a sandboxed app?).

With Xwayland having access to libei in some way I think at least for
sandboxed X11 apps there must be no possibility for tampering.
Unsandboxed X11 clients are not so important in comparison. If a
non-sandboxed X11 client gains access to input emulation through
Xwayland then so be it. Do you agree with this sentiment?

[1] https://docs.flatpak.org/en/latest/sandbox-permissions.html#sandbox-permissions
[2] https://www.omgubuntu.co.uk/2020/02/flatseal-manage-flatpak-permissions

>
> Jonas
>
>
> [0] https://github.com/flatpak/xdg-desktop-portal/wiki/The-Permission-Store
> >
> > [1] https://gitlab.com/romangg/kwinft/-/commits/libei
> > [2] https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/431
> > [3] https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/465
> >
> > > The aim is that a client can simply iterate through all of the options until
> > > finds a connection. Once that's found, the actual code for emulating input is
> > > always the same so it's trivial to implement a client that works on any
> > > compositor that supports some backend of libeis.
> > > The server part only needs to care about the negotiation mechanisms it
> > > allows, i.e. GNOME will only have dbus/portal, sway will only have... dunno,
> > > fd exchange maybe?
> > >
> > > Next: because we have a separate channel for emulated input we can hook up
> > > XTEST to use libei to talk to a compositor. I have a PoC implementation for
> > > weston and Xwayland:
> > >   https://gitlab.freedesktop.org/whot/weston/-/commits/wip/eis
> > >   https://gitlab.freedesktop.org/whot/xserver/-/commits/wip/xwayland-eis
> > > With that xdotool can move the pointer. Note this is truly the most minimal
> > > code just to illustrate the point but you can fill in the blanks and do
> > > things like the compositor preventing XTEST or not, etc.
> > >
> > > This is all in very early stages with very little error checking so things
> > > will probably crash or disconnect unexpectedly. I've tried to document the
> > > API to make the intentions clear but there are still some very handwavy
> > > bits.
> > >
> > > Do let me know if you have any questions or suggestions please though.
> > >
> > > Cheers,
> > >   Peter
> > > _______________________________________________
> > > wayland-devel mailing list
> > > wayland-devel at lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/wayland-devel
> > _______________________________________________
> > wayland-devel mailing list
> > wayland-devel at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/wayland-devel