Screen shooting and recording protocols (Re: Authorized clients)

Thu Jan 9 12:47:39 PST 2014

On 09/01/14 10:00, Pekka Paalanen wrote:
> There are differences in the implementation details of shooting (stills)
> vs. recording (videos).
>
> Weston supports (though disabled atm, AFAIK) hw overlays in addition to
> the GL renderer. To make a screenshot, the overlays are temporarily
> disabled, so that GL can composite the whole image which can then be
> sent to the shooting client.
I wasn't aware of that. I was assuming that overlays simply couldn't be
recorded, which was the case with X11. It's nice to hear that the
compositor can fix that.

> That scheme might not work too well with recording, and achieving a
> zero-copy path to the video encoder will be hard. In that sense, the
> compositor autonomously pushing buffers to the client would be more
> performant, but then we have the problem of synchronizing buffer reads,
> writes, and re-uses. In the opposite direction, the synchronization was
> solved with the wl_surface.attach/commit and wl_buffer.release
> messages, but here an accidentally malfunctioning client has a greater
> risk to crash the compositor, if it does not return the buffers to the
> compositor in time, so the compositor needs a failsafe mechanism.
I'm not very familiar with the attach/commit/release system, but I don't
really see a big issue here. The client gives the compositor a few
buffers, the compositor fills them one by one and tells the client
whenever a new buffer is ready. The clients reads the data from the
buffer and when it is done, it gives the buffer back to the compositor
so it can be filled again. If the compositor runs out of buffers, it
simply skips a frame.

> Those are some reasons why screen recording (video) is easier to do as
> a compositor plugin, like it is currently in Weston. A separate client
> would need a non-trivial amount of new Wayland protocol to work well.
That's probably true, but you can't expect applications to write a
separate plugin for each compositor. Besides, I highly doubt that it's a
good idea to load ffmpeg/libav and all its dependencies into the
compositor - these libraries aren't exactly known for their stability ;).

> Instead, a client could ask the compositor to ask the user which window
> she wants to capture, and then the compositor would capture only that.
> Capturing individual outputs is a lot easier: Wayland core protocol
> already exposes all outputs, so the client can directly ask for a
> certain output.
The window picking function that I have now (for X11) is really just a
way to quickly enter the correct coordinates and size of the area that
should be recorded. I don't expect the user to move the window around.
And just to be clear, the goal is NOT to capture only the buffer of a
single window, because then 'subwindows' (like browser plugins) and
dialog windows won't be recorded. If I really wanted to capture just a
single SHM buffer, I would probably just do it client-size, in the same
way I already do OpenGL recording now (because this gives me much more
flexibility).

So what I'm asking for is just a function to get the rectangle (x,y,w,h)
that corresponds to the window directly below a given position (x,y).
The compositor doesn't even have to handle the complexity of 'real' user
interaction (i.e. showing a message to the user telling him to pick a
window, waiting for the user to do that, dealing with clients that make
a request and then die, ...). Such a function would do everything I
need, and I think it also covers what the existing screenshot
applications need. I prefer to do it like this because it is the most
simple way to implement this for the compositor, and it is more flexible
(e.g. applications can choose to select the recording area in advance
and then repeatedly use the same area without telling the user to select
it over and over again).

> In the part I cut out, there were some concerns about who and how
> should decide what hotkeys may or may not start shooting or recording.
> I think there was recently'ish a nice suggestion about a generic
> solution to the global hotkey problem, so you can probably leave it for
> that. I'm referring to the "clients bind to actions, server decides what
> hotkey is the action" kind of an idea, IIRC. There is a myriad of
> details to solve there, too.
That would make a lot more sense, at least it is a lot more flexible
than requiring the recording application to be launched by the same key
press that starts the recording (which would effectively force me to
split my application into two separate processes, and then I would have
to figure out a secure way to let these two processes communicate).

But what about things like mouse clicks? Can the compositor tell that
the user clicked the 'start recording' button?

Maarten Baert