Screen shooting and recording protocols (Re: Authorized clients)

Fri Jan 10 00:54:17 PST 2014

On Thu, 9 Jan 2014 22:38:47 +0100
Maarten Baert <maarten-baert at hotmail.com> wrote:

> But I agree that if the compositor is able to capture frames with zero
> overhead, then the compositor should just capture every frame and let
> the application decide. Is this realistic?

I think it is realistic on good platforms, and also essential for
performance. Needs to be tried out, of course.

> On 09/01/14 22:14, Martin Peres wrote:
> > I'm not saying supporting the acquisition of just a rectangle isn't
> > a good idea but if what the user wants is the output of a window,
> > it is stupid
> > to grab the whole screen. Shouldn't we try to make stuff just work,
> > whatever the user does? 
> Yes of course, this is something that is needed regardless of whether
> window picking is supported or not. So what I'm proposing is basically
> the ability to:
> - capture a single output (applications can deal with synchronization
> of multiple outputs if needed, synchronization code is needed for
> audio anyway so it's actually quite easy to add)
> - capture a part of a single output (video resolutions are often lower
> than screen resolutions, so some users prefer to record only a
> 1280x720 or 854x480 rectangle of their 1440x900 display and they just
> move whatever they want to record into that rectangle)
> The window picking would then just be an easy way to quickly get the
> correct x/y/w/h values rather than entering them manually.

What would you capture, when the window is already rotated? The aligned
bounding box perhaps?

Clients do not have a coordinate transformation, that could relate a
window to the global desktop or output coordinate space. Clients simply
do not know about those coordinate spaces, and the mapping between a
window and them may not be linear even in homogenous coordinates.

> My main point is that when the user wants to capture a window, the
> compositor should capture it as it appears on the screen rather than
> capturing the SHM buffer of the application (or have support for
> both):
> - The SHM buffer doesn't show subwindows, dialog windows, tooltips,
> drop-down menus, things like that.
> - Applications can be partially transparent. None of the popular video
> formats support transparency, and I really can't think of a sane way
> to display a transparent video in the first place. Maybe transparency
> is a cool feature for screenshots, but for video it is really a
> problem, so there should be a way to avoid that.
> - SHM buffers can use any sample format the application wants (if
> supported by the compositor of course). A screenshot/recording
> application can easily add code for the most common screen formats
> (these days almost every desktop uses BGRX or BGRA anyway), but
> dealing with whatever weird formats some applications might want to
> use will be a nightmare. The compositor already has to convert the
> buffer to the screen format when it is drawn, so it's much easier and
> probably also more efficient to capture the screen rather than doing
> a second conversion.

There is a slight problem with "as it appears on screen". One window
may appear multiple times in different forms on screens (the weston
views mechanism). Exposé effect temporarily scales the window into a
size that is not readable, do you want exposé to affect the capture?
There can be live preview views of a window that are not useful to
record. If the window is rotated, do you want it also captured rotated?

Another thing is that if the window is partially or completely obscured
by other windows. If you are recording that window, should you be
getting the window content, or what happens to be visible at its
position on screen? What if the window is on a virtual desktop that is
not visible at all?

In your proposition, how do you define what is part of the window or
not? It needs to relate to the protocol objects somehow.

Menus, tooltips, drop-downs etc. are a good question, since they can
extrude outside of the original window on any side like left or top -
how should that affect the capture? Would it appear as if the main
window temporarily jumped to the right/down when you record a video?

It seems capturing "a window" requires close cooperation with the shell
(plugin), and a probably a non-trivial amount of metadata to create a
pleasing video. I don't have a good proposition on how it should work.

Maybe the "just choose an aligned rect" way is really the easiest while
being suffient for all real use cases.

I would not worry about transparency or pixel formats too much. When
your "window" consists of more than one wl_surface, and maybe also even
otherwise like due to crop & scale or buffer_scale & transform, the
compositor has to do an intermediate composite of just that "window" to
be able to capture it. Some window effects would require a such
intermediate composite even without capturing, like applying
transparency to a window that has sub-surfaces.

Oh, OTOH color management... zero-copy output capture would of course
capture the framebuffer drawn for the output color space, and
additionally that could be e.g. 10 bits per channel.

...

Thanks,
pq