Authorized clients

Wed Jan 8 14:30:29 PST 2014

On 08/01/14 13:52, Martin Peres wrote:
> I think the screenshot API and the video recording one should be
> separate.
Ideally that's true, but it creates extra work for the compositor
developers. And I don't see a huge benefit, actually. On X11 I can
simply take 30 screenshots per second and this works great, the
performance is about as good as it can get (2ms for a 1920x1080 frame,
that's almost as fast as a simple copy) and the only issue is tearing
(which is something that a good screenshot protocol should already
solve, obviously we don't want tearing in screenshots either). So I
would be perfectly happy with a screenshot interface that works just
like X11 minus the tearing. Do we really need a separate API for that?

There's another gray area here: what about timelapse videos? These are
created by taking a screenshot every few seconds and then playing them
back at much higher speed. Should such an application use the screenshot
or the video API?

> For the configuration of the screenshot, I see two cases. Either we
> just want
> the compositor to grab the image and pass it to an application [1] or
> we want
> the screenshotting app to be able to be able to query the number of
> screens and
> windows and their positions. The first doesn't require a Wayland
> protocol, the
> second does however require a privileged protocol.
The second is required for more all complex use cases, i.e. where the
user wants to capture only a single screen (or part of a screen). I
already support this under X11 and it seems to be a feature that many
users use (simply because monitor resolutions are usually larger than
the desired video resolution).

> As for the video grabbing API, I see the same solutions. Different
> hotkeys could
> automatically grab the screen content (either by window or screen) or
> it could be
> queried using the screen/window layout query protocol. Once the screen
> capture
> has been set up, a stream of DMA-buf (or shm) should be sent to
> different program
> that would record the output to whatever format one wants (stream of
> png or video)
> using either sw or hw encoders. Both the first and second case would
> use an external
> program to select the output format and encoding method. The good
> thing is that this
> encoding program would be compositor-independent and could be shared
> by all of them.
> Weston could then get rid of his VA-encoder and just use this new
> protocol.
Video recording applications need to do a lot more than just starting
and stopping capturing. In particular, streaming to websites like
twitch.tv isn't trivial, it takes a few seconds to set things up.
Encoders need some time to initialize (not much, but more than one
frame). Then there's also audio. Audio hardware can go into standby and
needs a fraction of a second to recover. And then we have audio APIs
like JACK that are meant to be always-on, i.e. even if you don't use
them you should still keep the connection alive because
disconnecting/reconnecting creates interruptions in the effects
pipeline. It's already complex enough the way it is, please don't make
it any worse by adding additional weird requirements from the Wayland
compositor. I can deal with any authentication system when the
application is first started, but once that's done I need to be able to
start and stop recording at any time (my only alternative is to turn
Wayland into another always-on protocol where I'm capturing at all
times, and that's wasteful).

> The good thing about sending a stream of images is that we get
> explicit synchronization
> between the compositor and the screen grabbing app which means it can
> miss no frames
> nor sample the same one twice (unless that's what the app wants).
This is a really nice feature, but typical monitor frame rates (60 fps)
are a lot higher than typical video frame rates (30 fps or 25 fps). It
would be wasteful to capture every single frame unless it is possible to
do this with essentially zero overhead. If zero overhead is not
possible, it would be better to let the application request a specific
frame rate.

A possible alternative is to interpret screenshot requests as a request
to capture the next frame when it is available, and never capture the
same frame twice. The application can then maintain a queue of
screenshots request (very much like the ring buffer I currently use to
capture OpenGL applications) and as long as this queue is not empty, it
will get every single frame exactly once. That way no new API is needed.

> The screenshot "API" could just grab an image based on what hotkey is
> used:
> - window under the cursor
> - current screen (where the cursor is)
> - all the screens
> I'm not sure how to do that on touch devices, but this is a compositor
> implementation detail. Once the image is grabbed, the compositor
> should either
> save the file somewhere or pass it to an application.
IMHO this is something that the application should decide, not the
compositor. Otherwise one application can't get a consistent feature set
across compositors.

I think the protocol to request window information is something that
will be needed anyway, because it's a nice feature for video as well
(SSR already has this under X11 and I don't want to drop that feature).
And we already have a protocol to get screen information.

On 08/01/14 15:04, Sebastian Wick wrote:
> If the application starts recording the screen without user interaction
> I would consider it broken.
It is hard to define user interaction. Some users want to write their
own bash or python scripts to automate common tasks, which they can then
run from the terminal. There will be users that want a command-line
interface to take screenshots or record video (I have received a few
requests for just that). This shouldn't necessarily be the default, but
these users should be able to allow this usage somehow.

My solution would be to create *two* bash scripts, one that simply
launches the GUI (no arguments) and one that allows some command-line
arguments. The default configuration would be to mark only the first
bash script as trusted. The user can then decide to mark the other one
as trusted as well, so he will be able to use command-line arguments.

>From Martin Peres:
> We then discussed about visual feedback as a mean to provide some
> mitigation and show
> some applications are grabbing the screen in the background. That may
> be something you
> would be interested in, in your case. What do you think? 
It may be okay for screenshots because you can show the effect after the
screenshot has been taken. But it would be unacceptable for video unless
you can somehow make the effect visible on the screen but invisible in
the video, and still obvious enough that the user can't overlook it, but
not so much that it becomes annoying (e.g. if the user is playing a
video game while recording it, they don't want things on their screen
that can obstruct important game elements). That sounds pretty hard to do.

> So you want to trust every screenshot application? I don't think it is
> a good idea. It is a better one
> than trusting every app, but it still not is very efficient. 
What possible alternative do we have? You can't constantly ask the user
because that's annoying, you can't guess whether the user wants
something or not because you can't predict every use case so you will
guess wrong, and you can't trust any file owned by the user because
you're already assuming that the user is careless/stupid enough to
install malware. You need some criterion to decide whether an
application is to be trusted, and to me, a whitelist of trusted
applications seems to be the best choice by far.

> This is why I said the compositor shouldn't agree on a screenshot
> request if it can't
> tell if it was a user-made request or an app-made one. The only
> solutions we found
> so far have been:
> - listen for hot keys from an input device (we have to trust the
> kernel for not allowing to forge events)
> - require a confirmation of some sort (popup / systray icon / whatever) 
User interaction is something that can only be defined by the
application, not by the compositor. The compositor can't anticipate what
kind of GUI or CLI some application might decide to use. Maybe some
users want the ability to start/stop recording using an IR remote!
Actually that's not even that far-fetched with SteamOS and their 'big
picture mode'. Should every Wayland compositor add support for IR
remotes too?

------------------------------------------------------------------------

Anyway, why are we even arguing about the ability to take screenshots?
Right now, the typical Linux desktop does not use *any* sandboxing. Why
would malware be interested in screenshots when they can read (or
delete, or encrypt) every single file owned by the user? This
authentication API for screenshots is just a joke as long as this much
bigger problem exists. So why don't we take care of the *real* problem
first? The solution will probably involve SELinux or cgroups or another
sandboxing mechanism, and it is very likely that once we have this
mechanism, it will become trivial to make this screenshot API safe.

The solution would probably involve putting the screenshot application
in its own sandbox which is completely separate from the rest of the
system, and at that point it doesn't even matter anymore whether the
screenshot is started by the user or by a bash script, because the
screenshot won't leave the sandbox (unless the user instructs it to do
so) and malicious programs won't get access to it.

I predict that any decision made now will be useless until sandboxing is
implemented, and obsolete after that ...

PS: A malicious program can rename /run/user/1000/wayland-0 and replace
it with its own socket, which allows a man-in-the-middle attack. How are
we going to deal with /that/, without a sandboxing mechanism?

Maarten Baert
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/wayland-devel/attachments/20140108/c01665f9/attachment-0001.html>