Collaboration on standard Wayland protocol extensions

Drew DeVault sir at cmpwn.com
Mon Mar 28 02:29:57 UTC 2016


On 2016-03-28  8:55 AM, Carsten Haitzler wrote:
> i can tell you that screen capture is a security sensitive thing and likely
> won't get a regular wayland protocol. it definitely won't from e. if you can
> capture screen, you can screenscrape. some untrusted game you downloaded for
> free can start watching your internet banking and see how much money you have
> in which accounts where...

Right, but there are legitimate use cases for this feature as well. It's
also true that if you have access to /dev/sda you can read all of the
user's files, but we still have tools like mkfs. We just put them behind
some extra security, i.e. you have to be root to use mkfs.

> the simple solution is to build it into the wm/desktop itself as an explicit
> user action (keypress, menu option etc.) and now it can't be exploited as it's
> not pro grammatically available. :)
>
> i would imagine the desktops themselves would in the end provide video capture
> like they would stills.

I'd argue that this solution is far from simple. Instead, it moves *all*
of the responsibilities of your entire desktop into one place, and one
codebase. And consider the staggering amount of work that went into
making ffmpeg, which has well over 4x the git commits as enlightenment.

> > - Output configuration
> 
> why? currently pretty much every desktop provides its OWN output configuration
> tool that is part of the desktop environment. why do you want to re-invent
> randr here allowing any client to mess with screen config. after YEARS of games
> using xvidtune and what not to mess up screen setups this would be a horrible
> idea. if you want to make a presentation tool that uses 1 screen for output and
> another for "controls" then that's a matter of providing info that multiple
> displays exist and what type they may be (internal, external) and clients can
> tag surfaces with "intents" eg - this iss a control surface, this is an
> output/display surface. compositor will then assign them appropriately.

There's more than desktop environments alone out there. Not everyone
wants to go entirely GTK or Qt or EFL. I bet everyone on this ML has
software on their computer that uses something other than the toolkit of
their choice. Some people like piecing their system together and keeping
things lightweight, and choosing the best tool for the job. Some people
might want to use the KDE screengrab tool on e, or perhaps some other
tool that's more focused on doing just that job and doing it well. Or
perhaps there's existing tools like ImageMagick that are already written
into scripts and provide a TON of options to the user, which could be
much more easily patched with support for some standard screengrab
protocol than to implement all of its features in 5 different desktops.

We all have to implement output configuration, so why not do it the same
way and share our API? I don't think we need to let any client
manipulate the output configuration. We need to implement a security
model for this like all other elevated permissions.

Using some kind of intents system to communicate things like Impress
wanting to use one output for presentation and another for notes is
going to get out of hand quickly. There are just so many different
"intents" that are solved by just letting applications configure outputs
when it makes sense for them to. The code to handle this in the
compositor is going to become an incredibly complicated mess that rivals
even xorg in complexity. We need to avoid making the same mistakes
again. If we don't focus on making it simple, then in 15 years we're
going to be writing a new protocol and making a new set of mistakes. X
does a lot of things wrong, but the tools around it have a respect for
the Unix philosophy that we'd be wise to consider.

> > - More detailed surface roles (should it be floating, is it a modal,
> >   does it want to draw its own decorations, etc)
> 
> that seems sensible and over time i can imagine this will expand.

Cool. Suggestions for what sort of capability thiis protocol should
have, what kind of surface roles we will be looking at? We should
consider a few things. Normal windows, of course, which on compositors
like Sway would be tiled. Then there's floating windows, like
gnome-calculator, that are better off being tiled. Modals would be
something that pops up and prevents the parent window from being
interacted with, like some sort of alert (though preventing this
interactivity might not be the compositor's job). Then we have some
roles like dmenu would use, where the tool would like to arrange itself
(perhaps this would demand another permission?) Surfaces that want to be
fullscreen could be another. We should also consider additional settings
a surface might want, like negotiating for who draws the decorations or
whether or not it should appear in a taskbar sort of interface.

> > - Input device configuration
> 
> as above. i see no reason clients should be doing this. surface
> intents/roles/whatever can deal with this. compositor may alter how an input
> device works for that surface based on this.

I don't feel very strongly about input device configuration as a
protocol here, but it's something that many of Sway's users are asking
for. People are trying out various compositors and may switch back and
forth depending on their needs and they want to configure all of their
input devices the same way.

However, beyond detailed input device configuration, there are some
other things that we should consider. Some applications (games, vnc,
etc) will want to capture the mouse and there should be a protocol for
them to indicate this with (perhaps again associated with special
permissions). Some applications (like Krita) may want to do things like
take control of your entire drawing tablet.

> [snip] screen capture is a nasty one and for now - no. no access [snip]

Wayland has been in the making for 4 years. Fedora is thinking about
shipping it by default. We need to quit with this "not for now" stuff
and start thinking about legitimate use-cases that we're killing off
here. The problems are not insurmountable and they are going to kill
Wayland adoption. We should not force Wayland upon our users, we should
make it something that they *want* to switch to. I personally have
gathered a lot of interest in Sway and Wayland in general by
livestreaming development of it from time to time, which has led to more
contributors getting in on the code and more people advocating for us to
get Wayland out there.

> for the common case the DE can do it. for screen sharing kind of
> things... you also need input control (take over mouse and be able to
> control from app - or create a 2nd mouse pointer and control that...
> keyboard - same, etc. etc. etc.). [snip]

Screen sharing for VOIP applications is only one of many, many use-cases
for being able to get the pixels from your screen. VNC servers,
recording video to provide better bug reports or to demonstrate
something, and so on. We aren't opening pandora's box here, just
supporting video capture doens't mean you need to support all of these
complicated and dangerous things as well.

> nasty little thing and in implementing something like this you are also forcing
> compositors to work ion specific ways - eg screen capture will likely FORCE the
> compositor to merge it all into a single ARGB buffer for you rather than just
> assign it to hw layers. or perhaps it would require just exposing all the
> layers, their config and have the client "deal with it" ? but that means the
> compositor needs to expose its screen layout. do you include pointer or not?
> compositor may draw ptr into the framebuffer. it may use a special hw layer.
> what about if the compositor defers rendering - does a screen capture api force
> the compositor to render when the client wants? this can have all kinds of
> nasty effects in the rendering pipeline - for use our rendering pipeline iss
> not in the compositor but via the same libraries clients use so altering this
> pipeline affects regular apps as well as compositor. ... can of worms :)

All of this would still be a problem if you want to support video
capture at all. You have to get the pixels into your encoder somehow.
There might be performance costs, but we aren't recording video all the
time.

We can make Wayland support use-cases that are important to our users or
we can watch them stay on xorg perpetually and end up maintaining two
graphical stacks forever.

--
Drew DeVault


More information about the wayland-devel mailing list