Collaboration on standard Wayland protocol extensions

Carsten Haitzler (The Rasterman) raster at rasterman.com
Mon Mar 28 14:03:00 UTC 2016


On Mon, 28 Mar 2016 09:00:34 -0400 Drew DeVault <sir at cmpwn.com> said:

> On 2016-03-28  2:13 PM, Carsten Haitzler wrote:
> > yes but you need permission and that is handled at kernel level on a
> > specific file. not so here. compositor runs as a specific user and so you
> > cant do that. you'd have to do in-compositor security client-by-client.
> 
> It is different, but we should still find a way to do it. After all,
> we're going to be in a similar situation eventually where we're running
> sandboxed applications and the compositor is granting rights from the
> same level of privledge as the kernel provides to root users (in this
> case, the role is almost of a hypervisor and a guest).

should we? is it right to create yet another rsecurity model in userspace
"quickly" just to solve things that dont NEED solving at least at this point.

> > you wouldn't recreate ffmpeg. ffmpec produce libraries like avcodec. like a
> > reasonable developer we'd just use their libraries to do the encoding - we'd
> > capture frames and then hand off to avcodec (ffmpeg) library routines to do
> > the rest. ffmpeg doesnt need to know how to capture - just to do what 99%
> > of its code is devoted to doing - encode/decode. :) that's rather simple.
> > already we have decoding wrapped - we sit on top of either gstreamer, vlc
> > or xine as the codec engine and just glue in output and control api's and
> > events. encoding is just the same but in reverse. :) the encapsulation is
> > simple.
> 
> True, that most of the work is in the avcodec. However, there's more to
> it than that. The entire command line interface of ffmpeg would be
> nearly impossible to build into the compositor effectively. With ffmpeg
> I can capture x, flip it, paint it sepia, add a logo to the corner, and
> mux it with my microphone and a capture of the speakers (thanks,
> pulseaudio) and add a subtitle track while I'm at it. Read the ffmpeg
> man pages. ffmpeg-all(1) is 23,191 lines long on my terminal (that's
> just the command line interface, not avcodec). There's no way in hell
> all of the compositors/DEs are going to be able to fulfill all of its
> use cases, nor do I think we should be trying to.
> 
> Look at things like OBS. It lets you specify detailed encoding options
> and composites a scene from multiple video sources and audio sources,
> as well as letting the user switch between different scenes with
> configurable transitions. It even lets you embed a web browser into the
> final result! All of this with a nice GUI to top it off. Again, we can't
> possibly hope to effectively implement all of this in the compositor/DE,
> or the features of the other software that we haven't even thought of.

adding watermarks can be done after encoding as another pass (encode in high
quality). hell watermarks can just be a WINDOW (surface) on the screen. you
don't need options. ass for audio - not too hard to do along with it. just
offer to record an input device - and choose (input can be current mixed output
or a mic ... or both).

> > the expectation is there won't be generic tools but desktop specific ones.
> > the CURRENT ecosystem of tools exist because that is the way x was designed
> > to work. thus the srate of software matches its design. wayland is
> > different. thus tools and ecosystem will adapt.
> 
> That expectation is misguided. I like being able to write a script to
> configure my desktop layout between several presets. Here's an example -
> a while ago, I used a laptop at work that could be plugged into a
> docking station. I would close the lid and use external displays at my
> desk. I wanted to automatically change the screen layout when I came and
> went, so I wrote a script that used xrandr to do it. It detected when
> there were new outputs plugged in, then disabled the laptop screen and
> enabled+configured the two new screens in the correct position and
> resolution. This was easy for me to configure to behave the way I wanted
> because the tooling was flexible and cross-desktop. Sure, we could make
> each compositor support it, but each one is going to do it differently
> and in their own subtly buggy ways and with their own subset of the
> total possible features and use-cases, and none of them are going to
> address every possible scenario.

exactly what you describe is how e works out of the box. no sscripts needed.
requiring people write script to do their screen configuration is just wrong.
taking the position of "well i give up and won't bother and will just make my
users write scripts instead" iss sticking your head in the sand and not solving
the problem. you are now asking everyone ELSE who writes a compositor to
implement a protocol because YOU wont solve a problem that others have solved
in a user friendly manner.

i've been doing x11 wm's since 1996. i've seen the bad, the ugly and the
horrible. there is no way i want any kind of protocol for configuring the
screen. not after having seen just how much it is abused when there and what a
horrible state things are left in when it's there.

> > as for output config - why would the desktops that already have their own
> > tools then want to support OTHER tools too? their tools integrate with
> > their settings panels and look and feel right and support THEIR policies.
> 
> Base your desktop's tools on the common protocol, of course. Gnome
> settings, KDE settings, arandr, xrandr, nvidia-settings, and so on, all
> seem to work fine configuring your outputs with the same protocol today.
> Yes, the protocol is meh and the implementation is a mess, but the
> clients of that protocol aren't bad by any stretch of the imagination.

no tools. why do it? it's built in. in order for screen config "magic" to
work  set of metadata  attached to screens. you can set priority (screens get
numbers from highest to lowest priority at any given time allowing behaviour
like your "primary" screen to migrate to an external one then migrate back when
external monitor is attached etc.) sure we can start having that metadata
separate but then ALTERNATE TOOLS won't be able to configure it thus breaking
the desktop environment not providing metadata and other settings associated
with a display. this breaks functionality for users who then complain about
things not working right AND then the compositor has to now deal with these
"error cases" too because a foreign tool will be messing with its data/setup.

> > let me give you an example:
> > 
> > http://devs.enlightenment.org/~raster/ssetup.png
> > 
> > [snip]
> 
> This is a very interesting screenshot, and I hadn't considered this. I
> don't think it's an unsolvable problem, though - we can make the
> protocol flexible enough to allow compositor-specific metadata to be
> added and configurable. These are the sorts of requirements I want to be
> gathering to design this protocol with.

as above. i have seen screen configuration used and abused over the years where
i just do not want to have a protocol for messing around with it for any
client. give them an inch and they'll take a mile.

> > no - we don't have to implement it as a protocol. enlightenment needs zero
> > protocol. it's done by the compositor. the compositors own tool is simply a
> > settings dialog inside the compositor itself. no protocol. not even a tool.
> > it's the same as edit/tools -> preferences in most gui apps. its just a
> > dialog the app shows to configure itself.
> 
> I currently do several things in different processes/binaries that
> enlightenment does in the compositor, things like the bar and the
> wallpaper. I don't want to make an output configuration GUI tool nested
> into the compositor, it's out of scope.

and that's perfectly fine - that is your choice. do not force your choice on
other compositors. you can implement all the protocol you want in any way you
want for your wm's tools.

> > chances are gnome likely will do this via dbus (they love dbus :)). kde - i
> > don't know. but not everyone is implementing a wayland protocol at all so
> > assuming they are and saying "do it the same way" is not necessarily saving
> > any work.
> 
> We're all writing wayland compositors here. We may not all have dbus or
> whatever else in common, but we do have the wayland protocol in common,
> and it can support this use-case. It makes sense to use it.

gnome does almost everything with dbus. they love dbus. a lot of gnome is
centred around dbus. they likely will choose dbus to do this. likely. i
personally wouldn't choose to use dbus.

> > then intents are only a way of deciding where a surface is to be displayed -
> > rather than on the current desktop/screen.
> > 
> > so simply mark a surface as "for presentation" and the compositor will put
> > it on the non-internal display (chosen maybe by physical size reported in
> > edid as the larger one, or by elimination - its on the screen OTHER than the
> > internal... maybe user simply marks/checkboxes that screen as "use this
> > screen for presenting" and all apps that want so present get their content
> > there etc.)
> 
> Man, this is going to get really complicated. How do you decide what
> display is "internal" or not? What if the user wants to present on their

at least e already knows this. its screen management subsystem is perfectly
aware of this. :)

> primary display? What about applications that use the entire output for

the app can simply not request to present on their "presentation" screen... or
the user would mark their primary screen (internal on laptop maybe) AS their
presentation screen - more metadata to be held by compositor.

now ALL presentation tools behave the same -  you dont have to reconfigure each
one separately and deal with the difference and lack or otherwise of features.
it's done in 1 place - compositor, and then all apps that want to do a
similar thing follow and work "as expected". far better than just ignoring the
issue. you yourself already talked about extra tags/hints/whatever - this is
one of those.

> things other then presentations? What if the application wants to use
> several outputs, and for different purposes? What language are you going
> to use to describe these settings to the user in a way that makes more
> sense than the clients describing for themselves why they need to use a
> particular output?

because this require clients DEFINING screen layout. wayland was specifically
designed to HIDE THIS. if the compositor displayed a screen wrapped around a
sphere in real life in a room - then it doesn't have rectangles... how will an
app deal with that? what if the compositor is literally a VR world with
surfaces wrapped around spheres and cubes - the point of wayland's design was
to hide this info from clients completely so the compositor decides based on
environment, not each and every client. this was a basic premise/design in
wayland from the get go and it was a good one. letting apps break this
abstraction breaks this design.

> > so what you are saying is it's better to duplicate all this logic of screen
> > configuration inside every app that wants to present things (media players -
> > play movie on presentation screen, ppt/impress/whatever show presentation
> > there, etc. etc.) and how to configure the screen etc. etc., rather than
> > have a simple tag/intent and let your de/wm/compositor "deal with it"
> > universally for all such apps in a consistent way?
> 
> No. Applications want to be full screen or they don't want to be. If
> they want to pick a particular output, we can easily let them do so.

i don't know about you.. but fullscreen to enlightenment means you use up ONE
SCREEN. not all screens. and from user response.. they LOVE IT. it is correct.
it's the right way. so when an app asks to be fullscreen it gets to use the
scren its on - not all. so no. fullscreen does NOT mean they would want to span
all screens (you imply that) and then just draw different areas of their
massive window to correspond to screens (and control those screens,
resolutions, geometries etc.).

what makes sense is an app hints at the purpose of its window and opens n
windows (surfaces). it can ask for fullscreen for each. the hints would allow
the compositor to choose which screen the window/surface is assigned to.

> > > Cool. Suggestions for what sort of capability thiis protocol should
> > > have, what kind of surface roles we will be looking at? We should
> > > consider a few things. Normal windows, of course, which on compositors
> > > like Sway would be tiled. Then there's floating windows, like
> > 
> > ummm whats the difference between floating and normal? apps like gnome
> > calculator just open ... normal windows.
> 
> Gnome calculator doesn't like being tiled: https://sr.ht/Ai5N.png

i think the problem is you are not handling min/max sizing of clients
properly. :) you need to fix sway. gnome calculator is not sizing up its buffer
on surface size. that is a message "i can't be bigger than this - this is my
biggest size. deal with is". you need to deal with it. eg - pad it and make it
sized AT the buffer size :)

> There are probably some other applications that would very much like to
> be shown at a particular aspect ratio or resolution.

as above. buffer size tells you that.

> > xdg shell should be handling these already - except dmenu. dmenu is almost a
> > special desktop component. like a shelf/panel/bar thing.
> 
> dmenu isn't the only one, though, that may want to arrange itself in
> special ways. Lemonbar and rofi also come to mind.

all of these basically are "desktop components" ala
taskbars/shelves/panels/whatever - i know that for e we don't want to support
such apps. these are built in. i don't know what gnome or kde think but these
go against their design as an integrated desktop environment. YOU need these
because your compositor has no such feature itself. the bigger desktops don't
need it. they MAY support it - may not. i know i don't want to. :)

> > > [input is] something that many of Sway's users are asking for.
> > 
> > they are going to have to deal with this then. already gnome and kde and e
> > will all configure mouse accel/left/right mouse on their own based on
> > settings. yes
> > - i can RUN xset and set it back later but its FIGHTING with your DE.
> > waqyland is the same. use the desktop tools for this :) yes - it'll change
> > between compositors.  :) at least in wayland you cant fight with the
> > compositor here. for sway - you are going ot have to write this yourself.
> > eg - write tools that talk to sway or sway reads a cfg file you edit or
> > whatever. :)
> 
> I've already written this into sway, fwiw, in your config file. I think
> this is fine, too, and I intend to keep supporting configuring outputs
> like that. But consider the use case of Krita, or video games like Osu!

i don't know osu - but i see no reason krita needs to configure a tablet. it
can just deal with input from it. :)

> > > However, beyond detailed input device configuration, there are some
> > > other things that we should consider. Some applications (games, vnc,
> > > etc) will want to capture the mouse and there should be a protocol for
> > > them to indicate this with (perhaps again associated with special
> > > permissions). Some applications (like Krita) may want to do things like
> > > take control of your entire drawing tablet.
> > 
> > as i said. can of worms. :)
> 
> It's a can of worms we should deal with, and one that I don't think it's
> hard to deal with. libinput lets you configure a handful of details
> about input devices. Let's expose these things in a protocol.

input is very sensitive. having done this for years and watched how games like
to turn off key repeat then leave it off when they crash... or change mouse
accel then you find its changed everywhere and have to "fix it" etc. etc. - i'd
be loathe to do this. give them TOO much config ability anbd it can become a
security issue.

> > you have no idea how many non-security-sensitive things need fixing first
> > before addressing the can-of-worms problems. hell nvidia just released
> > drivers that requrie compositors to re-do how they talk to egl/kms/drm to
> > work that's not compatible with existing drm dmabuf buffers etc. etc.
> 
> Why do those things need to be dealt with first? Sway is at a good spot
> where I can start thinking about these sorts of things. There are
> enough people involved to work on multiple things at once. Plus,
> everyone thinks nvidia's design is bad and we're hopefully going to see
> something from them that avoids vendor-specific code.

because these imho are far more important. you might be surprised at how few
people are involved.

> I don't see these problems as a can of worms. I see them as problems
> that are solvable and necessary to solve, and now is a good time to
> solve them. My compositor is coming up on version 1.0. Supporting the
> APIs is the driver's problem, we've described the spec and as soon as
> they implement it, it will Just Work(tm).
> 
> > even clients and decorations. tiled wm's will not want clients to add
> > decorations with shadows etc. - currently clients will do csd because csd is
> > what weston chose and gnome has followed and enlightenment too. kde do not
> > want to do csd. i think that's wrong.
> 
> What is a can of worms is the argument over whether or not we should use
> CSD or SSD. I fall in the latter camp, but I don't think we need to
> fight over it now. We should be able to agree that a protocol for
> negotiating whether or not borders are drawn would be reasonable. Is it
> a GTK app that does nothing interesting with its titlebar? Well, if the
> compositor wants to draw its borders, then let it do so. Does it do
> fancy GTK stuff with the borders? Well, no, mister compositor, I want to
> do fancy things. Easy enough.

not so simple. with more of the ui of an app being moved INTO the border
(titlebar etc.) this is not a simple thing to just turn it off. you then turn
OFF necessary parts of the ui or have to push the problem out to the app to
"fallback". only having CSD solves all that complexity and is more efficient
than SSD when it comes to things like assigning hw layers or avoiding copies of
vast amounts of pixels. i was against CSD to start with too but i see their
major benefits.

of course the shadow padding area is something i do see as optional and
something to hint at that would be useful. i can't see gnome dropping CSD
especially given how integrated to the ui it's becoming. i can tel you that i'm
strongly considering going the same way and fully integrating into CSD for many
good reasons that go far beyond just a desktop.

> > it adds complexity to wayland just to "not follow the convention". but
> > for tiling i see the point of at least removing the shadows. clients
> > may choose to slap a title bar there still because it's useful
> > displaying state. but advertising this info from the compositor is not
> > standardized. what do you advertise to clients? where/when? at connect
> > time? at surface creation time? what negotiation is it? it easily
> > could be that 1 screen or desktop is tiled and another is not and you
> > dont know what to tell the client until it has created a surface and
> > you know where that surface would go. perhaps this might be part of a
> > larger set of negotiation like "i am a mobile app so please stick me
> > on the mobile screen" or "i'm a desktop app - desktop please" then
> > with the compositor saying where it decided to allocate you (no mobile
> > screen available - you are on desktop) and app is expected to adapt...  
> 
> In Wayland you create a surface, then assign it a role. Extra details
> can go in between, or go in the call that gives it a role. Right now
> most applications are creating their surface and then making it a shell
> surface. The compositor can negotiate based on its own internal state
> over whether a given output is tiled or not, or in cases like AwesomeWM,
> whether a given workspace is tiled or not. And I don't think the
> decision has to be final. If the window is moved to another output or
> really if any of the circumstances change, they can renegotiate and the
> surface can start drawing its own decorations.

yup. but this signalling/negotiation has to exist. currently it doesnt. :)

> > there's SIMPLE stuff like - what happens when compositor crashes? how do we
> > handle this? do you really want to lose all your apps when compositors
> > crash? what should clients do? how do we ensure clients are restored to the
> > same place and state? crash recovery is important because it is always what
> > allows updates/upgrades without losing everything. THIS stuff is still "un
> > solved". i'm totally not concerned about screen casting or vnc etc. etc.
> > until all of these other nigglies are well solved first.
> 
> I'm still not on board with all of this "first" stuff. I don't see any
> reason why we have to order ourselves like this. It all needs to get
> done at some point. Right now we haven't standardized anything, and each
> compositor is using its own unique, incompatible way of taking
> screenshots and recording videos, and each is probably introducing some
> kind of security problem.

you aren't going to talk me into implementing something that is important for
you and not a priority for e until such a time as i'm satisfied that the other
issues are solved. you are free to do what you want, but standardizing things
takes a looong time and a lot of experimentation, discussion, and repeating
this. we have resources on wayland and nothing you described is a priority for
them. there are far more important things to do that are actual business
requirements and so the people working need to prioritize what is such a
requirement as opposed to what is not. resources are not infinite and free.

> > apps can show their own content for their own bug reporting. for system-wide
> > reporting this will be DE integrated anyway. supporting video capture is a a
> > can of worms. as i said - single buffer? multiple with metadata? who does
> > conversion/scaling/transforms? what is the security model? and as i said -
> > this has major implications to the rendering back-end of a compositor.
> 
> The compositor hands RGBA (or ARGB, whatever, I don't care, we just pick
> one) data to the client that's recording. This problem doesn't have to
> be complicated. As for the "major implications"...

let me complicate it for you. let's say i'm playing a video fullscreen. you now
have to convert argb to yuv then encode when it would have been far more
efficient to get access directly to the yuv buffer before it was even scaled to
screen size... :) so you have just specified a protocol that is by design
inefficient when it could be more efficient.

> > there's a difference. when its an internal detail is can be changed and
> > adapted to how the compositor and its rendering subsystem work. when its a
> > protocol you HAVE to support THAT protocol and the way THAT protocol defines
> > things to work or apps break.
> 
> You STILL have to get the pixels into the encoder on the compositor
> side. You will ALWAYS have to do that if you want to support video
> captures, regardless of who's doing it. At some point you're going to
> have to get the pixels you're rendering and hand them off to someone, be
> that libavcodec or a privledged client.

yes - but when, how often and via what mechanisms pixels get there is a very
delicate thing.

> > > We can make Wayland support use-cases that are important to our users or
> > > we can watch them stay on xorg perpetually and end up maintaining two
> > > graphical stacks forever.
> > 
> > priorities. there are other issues that should be solved first before
> > worrying about the pandoras box ones.
> 
> These are not pandora's box. These are small, necessary features.

i disagree. i've been doing graphics for long enough to smell the nasties from
a mile off. it's not trivial. the decisions that are made now will haunt us
for a lifetime. they are not internal details that can be fixed easily. even
internal details are hard to fix once enough code relies on them...

so far we don't exactly have a lot of inter-desktop co-operation happening.
it's pretty much everyone for themselves except for a smallish core protocol.
do NOT try and solve security sensitive AND performance sensitive AND design
limiting/dictating things first and definitely don't do it without everyone on
the same page.


-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com



More information about the wayland-devel mailing list