Screen shooting and recording protocols (Re: Authorized clients)

Fri Jan 10 06:26:19 PST 2014

On 10/01/14 09:54, Pekka Paalanen wrote:
> I think it is realistic on good platforms, and also essential for
> performance. Needs to be tried out, of course.
X11 does capturing by simply copying the buffer to a SHM image. This
takes about 2ms for a 1920x1080 frame. That's perfectly usable,
considering that any video encoder will need far more time to encode
that same frame. So it's nice to have zero overhead capturing, but not
essential IMO.

> What would you capture, when the window is already rotated? The aligned
> bounding box perhaps?
>
> Clients do not have a coordinate transformation, that could relate a
> window to the global desktop or output coordinate space. Clients simply
> do not know about those coordinate spaces, and the mapping between a
> window and them may not be linear even in homogenous coordinates.
If a window is rotated, then the user will probably expect a screenshot
of a rotated window, not one of the original un-rotated window. So yes,
I think the axis-aligned bounding box is the best choice in that case.

> There is a slight problem with "as it appears on screen". One window
> may appear multiple times in different forms on screens (the weston
> views mechanism). Exposé effect temporarily scales the window into a
> size that is not readable, do you want exposé to affect the capture?
> There can be live preview views of a window that are not useful to
> record.
So it would be up to the compositor to decide what rectangle is the
right one.

> If the window is rotated, do you want it also captured rotated?
Yes. Why on earth would the user decide to rotate a window if he doesn't
want it to be captured like that?

> Another thing is that if the window is partially or completely obscured
> by other windows. If you are recording that window, should you be
> getting the window content, or what happens to be visible at its
> position on screen?
Depends on the goal. It would be nice to have both features available.

> What if the window is on a virtual desktop that is
> not visible at all?
Then how would you pick the window in the first place?

> In your proposition, how do you define what is part of the window or
> not? It needs to relate to the protocol objects somehow.
I assumed it would be impossible (or at least very impractical) to
capture tooltips and menus as part of a window. I mean, what if the
window is transformed and the tooltips are not? I wanted to avoid the
issue entirely by simply recording the screen as-is, by choosing a
rectangle on the screen based on x/y/w/h. Do you think it is realistic
to require that all compositors implement the complex logic to capture
and follow a single window, without transformations, but with support
for tooltips and menus? I fear that most compositors won't implement it
at all, and if they do, it will likely be buggy because it hasn't seen
enough testing.

> Menus, tooltips, drop-downs etc. are a good question, since they can
> extrude outside of the original window on any side like left or top -
> how should that affect the capture? Would it appear as if the main
> window temporarily jumped to the right/down when you record a video?
Tooltips and menus that are not fully inside the recorded rectangle
would be partially unreadable, that's correct. There's not much that can
be done about that, right? You can't change the size of a video once it
has started, encoders don't allow that (and video players don't support it).

> It seems capturing "a window" requires close cooperation with the shell
> (plugin), and a probably a non-trivial amount of metadata to create a
> pleasing video. I don't have a good proposition on how it should work.
>
> Maybe the "just choose an aligned rect" way is really the easiest while
> being suffient for all real use cases.
That's exactly why I suggested it. It's simple to implement and good enough.

> Oh, OTOH color management... zero-copy output capture would of course
> capture the framebuffer drawn for the output color space, and
> additionally that could be e.g. 10 bits per channel.
Is it feasible to temporarily 'downgrade' to whatever capture format the
screenshooter/screen recorder requests? That will probably be BGRA (8-bit).

For the color space I just assume sRGB on X11, and AFAIK image viewers
and video players do the same, so no-one complains that their
screenshots/videos look wrong. Color management in video is a nightmare
anyway, there are a few (poorly defined) versions of YUV that you can
use, but you can't embed an ICC profile or anything like that AFAIK :(.

Do 10-bit displays actually exist? I thought even those 8-bit displays
are really only 6-bit plus dithering.