<html> <head> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"> </head> <body bgcolor="#FFFFFF" text="#000000"> <div class="moz-cite-prefix">On 08/01/14 13:52, Martin Peres wrote: </div> <blockquote cite="mid:52CD4A1F.5090206@free.fr" type="cite">I think the screenshot API and the video recording one should be separate. </blockquote> Ideally that's true, but it creates extra work for the compositor developers. And I don't see a huge benefit, actually. On X11 I can simply take 30 screenshots per second and this works great, the performance is about as good as it can get (2ms for a 1920x1080 frame, that's almost as fast as a simple copy) and the only issue is tearing (which is something that a good screenshot protocol should already solve, obviously we don't want tearing in screenshots either). So I would be perfectly happy with a screenshot interface that works just like X11 minus the tearing. Do we really need a separate API for that? There's another gray area here: what about timelapse videos? These are created by taking a screenshot every few seconds and then playing them back at much higher speed. Should such an application use the screenshot or the video API? <blockquote cite="mid:52CD4A1F.5090206@free.fr" type="cite"> For the configuration of the screenshot, I see two cases. Either we just want the compositor to grab the image and pass it to an application [1] or we want the screenshotting app to be able to be able to query the number of screens and windows and their positions. The first doesn't require a Wayland protocol, the second does however require a privileged protocol. </blockquote> The second is required for more all complex use cases, i.e. where the user wants to capture only a single screen (or part of a screen). I already support this under X11 and it seems to be a feature that many users use (simply because monitor resolutions are usually larger than the desired video resolution). <blockquote cite="mid:52CD4A1F.5090206@free.fr" type="cite">As for the video grabbing API, I see the same solutions. Different hotkeys could automatically grab the screen content (either by window or screen) or it could be queried using the screen/window layout query protocol. Once the screen capture has been set up, a stream of DMA-buf (or shm) should be sent to different program that would record the output to whatever format one wants (stream of png or video) using either sw or hw encoders. Both the first and second case would use an external program to select the output format and encoding method. The good thing is that this encoding program would be compositor-independent and could be shared by all of them. Weston could then get rid of his VA-encoder and just use this new protocol. </blockquote> Video recording applications need to do a lot more than just starting and stopping capturing. In particular, streaming to websites like twitch.tv isn't trivial, it takes a few seconds to set things up. Encoders need some time to initialize (not much, but more than one frame). Then there's also audio. Audio hardware can go into standby and needs a fraction of a second to recover. And then we have audio APIs like JACK that are meant to be always-on, i.e. even if you don't use them you should still keep the connection alive because disconnecting/reconnecting creates interruptions in the effects pipeline. It's already complex enough the way it is, please don't make it any worse by adding additional weird requirements from the Wayland compositor. I can deal with any authentication system when the application is first started, but once that's done I need to be able to start and stop recording at any time (my only alternative is to turn Wayland into another always-on protocol where I'm capturing at all times, and that's wasteful). <blockquote cite="mid:52CD4A1F.5090206@free.fr" type="cite"> The good thing about sending a stream of images is that we get explicit synchronization between the compositor and the screen grabbing app which means it can miss no frames nor sample the same one twice (unless that's what the app wants). </blockquote> This is a really nice feature, but typical monitor frame rates (60 fps) are a lot higher than typical video frame rates (30 fps or 25 fps). It would be wasteful to capture every single frame unless it is possible to do this with essentially zero overhead. If zero overhead is not possible, it would be better to let the application request a specific frame rate. A possible alternative is to interpret screenshot requests as a request to capture the next frame when it is available, and never capture the same frame twice. The application can then maintain a queue of screenshots request (very much like the ring buffer I currently use to capture OpenGL applications) and as long as this queue is not empty, it will get every single frame exactly once. That way no new API is needed. <blockquote cite="mid:52CD4A1F.5090206@free.fr" type="cite"> The screenshot "API" could just grab an image based on what hotkey is used: - window under the cursor - current screen (where the cursor is) - all the screens I'm not sure how to do that on touch devices, but this is a compositor implementation detail. Once the image is grabbed, the compositor should either save the file somewhere or pass it to an application. </blockquote> IMHO this is something that the application should decide, not the compositor. Otherwise one application can't get a consistent feature set across compositors. I think the protocol to request window information is something that will be needed anyway, because it's a nice feature for video as well (SSR already has this under X11 and I don't want to drop that feature). And we already have a protocol to get screen information. <div class="moz-cite-prefix">On 08/01/14 15:04, Sebastian Wick wrote: </div> <blockquote cite="mid:b47e6cc9bba74b6379f68d81686a5c24@sebastianwick.net" type="cite">If the application starts recording the screen without user interaction I would consider it broken. </blockquote> It is hard to define user interaction. Some users want to write their own bash or python scripts to automate common tasks, which they can then run from the terminal. There will be users that want a command-line interface to take screenshots or record video (I have received a few requests for just that). This shouldn't necessarily be the default, but these users should be able to allow this usage somehow. My solution would be to create *two* bash scripts, one that simply launches the GUI (no arguments) and one that allows some command-line arguments. The default configuration would be to mark only the first bash script as trusted. The user can then decide to mark the other one as trusted as well, so he will be able to use command-line arguments. From Martin Peres: <blockquote type="cite">We then discussed about visual feedback as a mean to provide some mitigation and show some applications are grabbing the screen in the background. That may be something you would be interested in, in your case. What do you think? </blockquote> It may be okay for screenshots because you can show the effect after the screenshot has been taken. But it would be unacceptable for video unless you can somehow make the effect visible on the screen but invisible in the video, and still obvious enough that the user can't overlook it, but not so much that it becomes annoying (e.g. if the user is playing a video game while recording it, they don't want things on their screen that can obstruct important game elements). That sounds pretty hard to do. <blockquote type="cite">So you want to trust every screenshot application? I don't think it is a good idea. It is a better one than trusting every app, but it still not is very efficient. </blockquote> What possible alternative do we have? You can't constantly ask the user because that's annoying, you can't guess whether the user wants something or not because you can't predict every use case so you will guess wrong, and you can't trust any file owned by the user because you're already assuming that the user is careless/stupid enough to install malware. You need some criterion to decide whether an application is to be trusted, and to me, a whitelist of trusted applications seems to be the best choice by far. <blockquote type="cite">This is why I said the compositor shouldn't agree on a screenshot request if it can't tell if it was a user-made request or an app-made one. The only solutions we found so far have been: - listen for hot keys from an input device (we have to trust the kernel for not allowing to forge events) - require a confirmation of some sort (popup / systray icon / whatever) </blockquote> User interaction is something that can only be defined by the application, not by the compositor. The compositor can't anticipate what kind of GUI or CLI some application might decide to use. Maybe some users want the ability to start/stop recording using an IR remote! Actually that's not even that far-fetched with SteamOS and their 'big picture mode'. Should every Wayland compositor add support for IR remotes too? <hr size="2" width="100%"> Anyway, why are we even arguing about the ability to take screenshots? Right now, the typical Linux desktop does not use any sandboxing. Why would malware be interested in screenshots when they can read (or delete, or encrypt) every single file owned by the user? This authentication API for screenshots is just a joke as long as this much bigger problem exists. So why don't we take care of the real problem first? The solution will probably involve SELinux or cgroups or another sandboxing mechanism, and it is very likely that once we have this mechanism, it will become trivial to make this screenshot API safe. The solution would probably involve putting the screenshot application in its own sandbox which is completely separate from the rest of the system, and at that point it doesn't even matter anymore whether the screenshot is started by the user or by a bash script, because the screenshot won't leave the sandbox (unless the user instructs it to do so) and malicious programs won't get access to it. I predict that any decision made now will be useless until sandboxing is implemented, and obsolete after that ... PS: A malicious program can rename /run/user/1000/wayland-0 and replace it with its own socket, which allows a man-in-the-middle attack. How are we going to deal with that, without a sandboxing mechanism? Maarten Baert </body> </html>