Collaboration on standard Wayland protocol extensions

Carsten Haitzler (The Rasterman) raster at rasterman.com
Mon Mar 28 05:13:21 UTC 2016


On Sun, 27 Mar 2016 22:29:57 -0400 Drew DeVault <sir at cmpwn.com> said:

> On 2016-03-28  8:55 AM, Carsten Haitzler wrote:
> > i can tell you that screen capture is a security sensitive thing and likely
> > won't get a regular wayland protocol. it definitely won't from e. if you can
> > capture screen, you can screenscrape. some untrusted game you downloaded for
> > free can start watching your internet banking and see how much money you
> > have in which accounts where...
> 
> Right, but there are legitimate use cases for this feature as well. It's
> also true that if you have access to /dev/sda you can read all of the
> user's files, but we still have tools like mkfs. We just put them behind
> some extra security, i.e. you have to be root to use mkfs.

yes but you need permission and that is handled at kernel level on a specific
file. not so here. compositor runs as a specific user and so you cant do that.
you'd have to do in-compositor security client-by-client.

> > the simple solution is to build it into the wm/desktop itself as an explicit
> > user action (keypress, menu option etc.) and now it can't be exploited as
> > it's not pro grammatically available. :)
> >
> > i would imagine the desktops themselves would in the end provide video
> > capture like they would stills.
> 
> I'd argue that this solution is far from simple. Instead, it moves *all*
> of the responsibilities of your entire desktop into one place, and one
> codebase. And consider the staggering amount of work that went into
> making ffmpeg, which has well over 4x the git commits as enlightenment.

you wouldn't recreate ffmpeg. ffmpec produce libraries like avcodec. like a
reasonable developer we'd just use their libraries to do the encoding - we'd
capture frames and then hand off to avcodec (ffmpeg) library routines to do the
rest. ffmpeg doesnt need to know how to capture - just to do what 99% of its
code is devoted to doing - encode/decode. :) that's rather simple. already we
have decoding wrapped - we sit on top of either gstreamer, vlc or xine as the
codec engine and just glue in output and control api's and events. encoding is
just the same but in reverse. :) the encapsulation is simple.

> > > - Output configuration
> > 
> > why? currently pretty much every desktop provides its OWN output
> > configuration tool that is part of the desktop environment. why do you want
> > to re-invent randr here allowing any client to mess with screen config.
> > after YEARS of games using xvidtune and what not to mess up screen setups
> > this would be a horrible idea. if you want to make a presentation tool that
> > uses 1 screen for output and another for "controls" then that's a matter of
> > providing info that multiple displays exist and what type they may be
> > (internal, external) and clients can tag surfaces with "intents" eg - this
> > iss a control surface, this is an output/display surface. compositor will
> > then assign them appropriately.
> 
> There's more than desktop environments alone out there. Not everyone
> wants to go entirely GTK or Qt or EFL. I bet everyone on this ML has
> software on their computer that uses something other than the toolkit of
> their choice. Some people like piecing their system together and keeping
> things lightweight, and choosing the best tool for the job. Some people
> might want to use the KDE screengrab tool on e, or perhaps some other
> tool that's more focused on doing just that job and doing it well. Or
> perhaps there's existing tools like ImageMagick that are already written
> into scripts and provide a TON of options to the user, which could be
> much more easily patched with support for some standard screengrab
> protocol than to implement all of its features in 5 different desktops.

the expectation is there won't be generic tools but desktop specific ones. the
CURRENT ecosystem of tools exist because that is the way x was designed to
work. thus the srate of software matches its design. wayland is different. thus
tools and ecosystem will adapt.

as for output config - why would the desktops that already have their own tools
then want to support OTHER tools too? their tools integrate with their settings
panels and look and feel right and support THEIR policies.

let me give you an example:

http://devs.enlightenment.org/~raster/ssetup.png

bottom-right - i can assign special scale factors and different toolkit
profiles per screen. eg one screen can be a desktop, one a media center style,
one a mobile "touch centric ui" etc. etc. - this is part of the screen setup
tool. a generic tool will miss features that make the desktop nice and
functional for its purposes. do you want to go create some kind of uber
protocol that every de has to support with every other de's feature set in it
and limit de's to modifying the protocol because they now have to go through a
shared protocol in libwayland that they cant just add features to as they
please? ok - so these features will be added adhoc in extra protocols so now
you have a bit of a messy protocol with 1 protocol referring to another... and
the "kde tool" messes up on e or the e tool messes up in gnome because all
these extra features are either not even supported by the tool or existing
features don't work because the de doesn't support those extensions?

just "i want to use the kde screen config tool" is not reason enough for there
to be a public/shared/common protocol. it will fall apart quickly like above
and simply mean work for most people to go support it rather than actual value.

> We all have to implement output configuration, so why not do it the same
> way and share our API? I don't think we need to let any client

no - we don't have to implement it as a protocol. enlightenment needs zero
protocol. it's done by the compositor. the compositors own tool is simply a
settings dialog inside the compositor itself. no protocol. not even a tool.
it's the same as edit/tools -> preferences in most gui apps. its just a dialog
the app shows to configure itself.

chances are gnome likely will do this via dbus (they love dbus :)). kde - i
don't know. but not everyone is implementing a wayland protocol at all so
assuming they are and saying "do it the same way" is not necessarily saving any
work.

> manipulate the output configuration. We need to implement a security
> model for this like all other elevated permissions.

like above. if gnome uses dbus - they will use polkit etc. etc. to decide that.
enlightenment doesn't even need to because there isn't even a protocol nor an
external tool - it's built directly in.

> Using some kind of intents system to communicate things like Impress
> wanting to use one output for presentation and another for notes is
> going to get out of hand quickly. There are just so many different
> "intents" that are solved by just letting applications configure outputs

even impress doesnt configure outputs. thank god for that.

> when it makes sense for them to. The code to handle this in the
> compositor is going to become an incredibly complicated mess that rivals
> even xorg in complexity. We need to avoid making the same mistakes
> again. If we don't focus on making it simple, then in 15 years we're
> going to be writing a new protocol and making a new set of mistakes. X
> does a lot of things wrong, but the tools around it have a respect for
> the Unix philosophy that we'd be wise to consider.

how would it be complex. a compositor is already, if decent, going to handle
multiple outputs. it's either going to auto-configure new ones to extend/clone
or maybe pop up a settings dialog. e already does this for example and
remembers config for that screen (edid+output) so plug it in a 2nd time and it
automatically uses the last stored config for that. so the screen will "work"
as basicalyl a biu product of making a compositor that can do multiple outputs.

then intents are only a way of deciding where a surface is to be displayed -
rather than on the current desktop/screen.

so simply mark a surface as "for presentation" and the compositor will put it
on the non-internal display (chosen maybe by physical size reported in edid as
the larger one, or by elimination - its on the screen OTHER than the
internal... maybe user simply marks/checkboxes that screen as "use this
screen for presenting" and all apps that want so present get their content
there etc.)

so what you are saying is it's better to duplicate all this logic of screen
configuration inside every app that wants to present things (media players -
play movie on presentation screen, ppt/impress/whatever show presentation there,
etc. etc.) and how to configure the screen etc. etc., rather than have a simple
tag/intent and let your de/wm/compositor "deal with it" universally for all
such apps in a consistent way?

> > > - More detailed surface roles (should it be floating, is it a modal,
> > >   does it want to draw its own decorations, etc)
> > 
> > that seems sensible and over time i can imagine this will expand.
> 
> Cool. Suggestions for what sort of capability thiis protocol should
> have, what kind of surface roles we will be looking at? We should
> consider a few things. Normal windows, of course, which on compositors
> like Sway would be tiled. Then there's floating windows, like

ummm whats the difference between floating and normal? apps like gnome
calculator just open ... normal windows.

> gnome-calculator, that are better off being tiled. Modals would be
> something that pops up and prevents the parent window from being
> interacted with, like some sort of alert (though preventing this
> interactivity might not be the compositor's job). Then we have some

yeah - good old "transient for" :)

> roles like dmenu would use, where the tool would like to arrange itself
> (perhaps this would demand another permission?) Surfaces that want to be
> fullscreen could be another. We should also consider additional settings
> a surface might want, like negotiating for who draws the decorations or
> whether or not it should appear in a taskbar sort of interface.

xdg shell should be handling these already - except dmenu. dmenu is almost a
special desktop component. like a shelf/panel/bar thing.

> > > - Input device configuration
> > 
> > as above. i see no reason clients should be doing this. surface
> > intents/roles/whatever can deal with this. compositor may alter how an input
> > device works for that surface based on this.
> 
> I don't feel very strongly about input device configuration as a
> protocol here, but it's something that many of Sway's users are asking
> for. People are trying out various compositors and may switch back and
> forth depending on their needs and they want to configure all of their
> input devices the same way.

they are going to have to deal with this then. already gnome and kde and e will
all configure mouse accel/left/right mouse on their own based on settings. yes
- i can RUN xset and set it back later but its FIGHTING with your DE. waqyland
is the same. use the desktop tools for this :) yes - it'll change between
compositors.  :) at least in wayland you cant fight with the compositor here.
for sway - you are going ot have to write this yourself. eg - write tools that
talk to sway or sway reads a cfg file you edit or whatever. :)

> However, beyond detailed input device configuration, there are some
> other things that we should consider. Some applications (games, vnc,
> etc) will want to capture the mouse and there should be a protocol for
> them to indicate this with (perhaps again associated with special
> permissions). Some applications (like Krita) may want to do things like
> take control of your entire drawing tablet.

as i said. can of worms. :)

> > [snip] screen capture is a nasty one and for now - no. no access [snip]
> 
> Wayland has been in the making for 4 years. Fedora is thinking about
> shipping it by default. We need to quit with this "not for now" stuff
> and start thinking about legitimate use-cases that we're killing off
> here. The problems are not insurmountable and they are going to kill
> Wayland adoption. We should not force Wayland upon our users, we should
> make it something that they *want* to switch to. I personally have
> gathered a lot of interest in Sway and Wayland in general by
> livestreaming development of it from time to time, which has led to more
> contributors getting in on the code and more people advocating for us to
> get Wayland out there.

you have no idea how many non-security-sensitive things need fixing first
before addressing the can-of-worms problems. hell nvidia just released drivers
that requrie compositors to re-do how they talk to egl/kms/drm to work that's
not compatible with existing drm dmabuf buffers etc. etc.

there's lots of things to solve like window "intents/tags/etc." that are not
security sensitive.

even clients and decorations. tiled wm's will not want clients to add
decorations with shadows etc. - currently clients will do csd because csd is
what weston chose and gnome has followed and enlightenment too. kde do not want
to do csd. i think that's wrong. it adds complexity to wayland just to "not
follow the convention". but for tiling i see the point of at least removing the
shadows. clients may choose to slap a title bar there still because it's useful
displaying state. but advertising this info from the compositor is not
standardized. what do you advertise to clients? where/when? at connect time? at
surface creation time? what negotiation is it? it easily could be that 1
screen or desktop is tiled and another is not and you dont know what to tell
the client until it has created a surface and you know where that surface would
go. perhaps this might be part of a larger set of negotiation like "i am a
mobile app so please stick me on the mobile screen" or "i'm a desktop app -
desktop please" then with the compositor saying where it decided to allocate
you (no mobile screen available - you are on desktop) and app is expected to
adapt...

these are not security can-of-worms things. most de's are still getting to the
point of "usable" atm without worrying about all of these extras yet.

there's SIMPLE stuff like - what happens when compositor crashes? how do we
handle this? do you really want to lose all your apps when compositors crash?
what should clients do? how do we ensure clients are restored to the same place
and state? crash recovery is important because it is always what allows
updates/upgrades without losing everything. THIS stuff is still "un solved".
i'm totally not concerned about screen casting or vnc etc. etc. until all of
these other nigglies are well solved first.

> > for the common case the DE can do it. for screen sharing kind of
> > things... you also need input control (take over mouse and be able to
> > control from app - or create a 2nd mouse pointer and control that...
> > keyboard - same, etc. etc. etc.). [snip]
> 
> Screen sharing for VOIP applications is only one of many, many use-cases
> for being able to get the pixels from your screen. VNC servers,
> recording video to provide better bug reports or to demonstrate
> something, and so on. We aren't opening pandora's box here, just
> supporting video capture doens't mean you need to support all of these
> complicated and dangerous things as well.

apps can show their own content for their own bug reporting. for system-wide
reporting this will be DE integrated anyway. supporting video capture is a a
can of worms. as i said - single buffer? multiple with metadata? who does
conversion/scaling/transforms? what is the security model? and as i said - this
has major implications to the rendering back-end of a compositor.

> > nasty little thing and in implementing something like this you are also
> > forcing compositors to work ion specific ways - eg screen capture will
> > likely FORCE the compositor to merge it all into a single ARGB buffer for
> > you rather than just assign it to hw layers. or perhaps it would require
> > just exposing all the layers, their config and have the client "deal with
> > it" ? but that means the compositor needs to expose its screen layout. do
> > you include pointer or not? compositor may draw ptr into the framebuffer.
> > it may use a special hw layer. what about if the compositor defers
> > rendering - does a screen capture api force the compositor to render when
> > the client wants? this can have all kinds of nasty effects in the rendering
> > pipeline - for use our rendering pipeline iss not in the compositor but via
> > the same libraries clients use so altering this pipeline affects regular
> > apps as well as compositor. ... can of worms :)
> 
> All of this would still be a problem if you want to support video
> capture at all. You have to get the pixels into your encoder somehow.
> There might be performance costs, but we aren't recording video all the
> time.

there's a difference. when its an internal detail is can be changed and
adapted to how the compositor and its rendering subsystem work. when its a
protocol you HAVE to support THAT protocol and the way THAT protocol defines
things to work or apps break.

keep it internal - you can break at will and adapt as needed, make it public
and you are boxed in by what the public api allows.

> We can make Wayland support use-cases that are important to our users or
> we can watch them stay on xorg perpetually and end up maintaining two
> graphical stacks forever.

priorities. there are other issues that should be solved first before worrying
about the pandoras box ones.

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com



More information about the wayland-devel mailing list