Collaboration on standard Wayland protocol extensions

Mon Mar 28 13:00:34 UTC 2016

On 2016-03-28  2:13 PM, Carsten Haitzler wrote:
> yes but you need permission and that is handled at kernel level on a specific
> file. not so here. compositor runs as a specific user and so you cant do that.
> you'd have to do in-compositor security client-by-client.

It is different, but we should still find a way to do it. After all,
we're going to be in a similar situation eventually where we're running
sandboxed applications and the compositor is granting rights from the
same level of privledge as the kernel provides to root users (in this
case, the role is almost of a hypervisor and a guest).

> you wouldn't recreate ffmpeg. ffmpec produce libraries like avcodec. like a
> reasonable developer we'd just use their libraries to do the encoding - we'd
> capture frames and then hand off to avcodec (ffmpeg) library routines to do the
> rest. ffmpeg doesnt need to know how to capture - just to do what 99% of its
> code is devoted to doing - encode/decode. :) that's rather simple. already we
> have decoding wrapped - we sit on top of either gstreamer, vlc or xine as the
> codec engine and just glue in output and control api's and events. encoding is
> just the same but in reverse. :) the encapsulation is simple.

True, that most of the work is in the avcodec. However, there's more to
it than that. The entire command line interface of ffmpeg would be
nearly impossible to build into the compositor effectively. With ffmpeg
I can capture x, flip it, paint it sepia, add a logo to the corner, and
mux it with my microphone and a capture of the speakers (thanks,
pulseaudio) and add a subtitle track while I'm at it. Read the ffmpeg
man pages. ffmpeg-all(1) is 23,191 lines long on my terminal (that's
just the command line interface, not avcodec). There's no way in hell
all of the compositors/DEs are going to be able to fulfill all of its
use cases, nor do I think we should be trying to.

Look at things like OBS. It lets you specify detailed encoding options
and composites a scene from multiple video sources and audio sources,
as well as letting the user switch between different scenes with
configurable transitions. It even lets you embed a web browser into the
final result! All of this with a nice GUI to top it off. Again, we can't
possibly hope to effectively implement all of this in the compositor/DE,
or the features of the other software that we haven't even thought of.

> the expectation is there won't be generic tools but desktop specific ones. the
> CURRENT ecosystem of tools exist because that is the way x was designed to
> work. thus the srate of software matches its design. wayland is different. thus
> tools and ecosystem will adapt.

That expectation is misguided. I like being able to write a script to
configure my desktop layout between several presets. Here's an example -
a while ago, I used a laptop at work that could be plugged into a
docking station. I would close the lid and use external displays at my
desk. I wanted to automatically change the screen layout when I came and
went, so I wrote a script that used xrandr to do it. It detected when
there were new outputs plugged in, then disabled the laptop screen and
enabled+configured the two new screens in the correct position and
resolution. This was easy for me to configure to behave the way I wanted
because the tooling was flexible and cross-desktop. Sure, we could make
each compositor support it, but each one is going to do it differently
and in their own subtly buggy ways and with their own subset of the
total possible features and use-cases, and none of them are going to
address every possible scenario.

> as for output config - why would the desktops that already have their own tools
> then want to support OTHER tools too? their tools integrate with their settings
> panels and look and feel right and support THEIR policies.

Base your desktop's tools on the common protocol, of course. Gnome
settings, KDE settings, arandr, xrandr, nvidia-settings, and so on, all
seem to work fine configuring your outputs with the same protocol today.
Yes, the protocol is meh and the implementation is a mess, but the
clients of that protocol aren't bad by any stretch of the imagination.

> let me give you an example:
> 
> http://devs.enlightenment.org/~raster/ssetup.png
> 
> [snip]

This is a very interesting screenshot, and I hadn't considered this. I
don't think it's an unsolvable problem, though - we can make the
protocol flexible enough to allow compositor-specific metadata to be
added and configurable. These are the sorts of requirements I want to be
gathering to design this protocol with.

> no - we don't have to implement it as a protocol. enlightenment needs zero
> protocol. it's done by the compositor. the compositors own tool is simply a
> settings dialog inside the compositor itself. no protocol. not even a tool.
> it's the same as edit/tools -> preferences in most gui apps. its just a dialog
> the app shows to configure itself.

I currently do several things in different processes/binaries that
enlightenment does in the compositor, things like the bar and the
wallpaper. I don't want to make an output configuration GUI tool nested
into the compositor, it's out of scope.

> chances are gnome likely will do this via dbus (they love dbus :)). kde - i
> don't know. but not everyone is implementing a wayland protocol at all so
> assuming they are and saying "do it the same way" is not necessarily saving any
> work.

We're all writing wayland compositors here. We may not all have dbus or
whatever else in common, but we do have the wayland protocol in common,
and it can support this use-case. It makes sense to use it.

> then intents are only a way of deciding where a surface is to be displayed -
> rather than on the current desktop/screen.
> 
> so simply mark a surface as "for presentation" and the compositor will put it
> on the non-internal display (chosen maybe by physical size reported in edid as
> the larger one, or by elimination - its on the screen OTHER than the
> internal... maybe user simply marks/checkboxes that screen as "use this
> screen for presenting" and all apps that want so present get their content
> there etc.)

Man, this is going to get really complicated. How do you decide what
display is "internal" or not? What if the user wants to present on their
primary display? What about applications that use the entire output for
things other then presentations? What if the application wants to use
several outputs, and for different purposes? What language are you going
to use to describe these settings to the user in a way that makes more
sense than the clients describing for themselves why they need to use a
particular output?

> so what you are saying is it's better to duplicate all this logic of screen
> configuration inside every app that wants to present things (media players -
> play movie on presentation screen, ppt/impress/whatever show presentation there,
> etc. etc.) and how to configure the screen etc. etc., rather than have a simple
> tag/intent and let your de/wm/compositor "deal with it" universally for all
> such apps in a consistent way?

No. Applications want to be full screen or they don't want to be. If
they want to pick a particular output, we can easily let them do so.

> > Cool. Suggestions for what sort of capability thiis protocol should
> > have, what kind of surface roles we will be looking at? We should
> > consider a few things. Normal windows, of course, which on compositors
> > like Sway would be tiled. Then there's floating windows, like
> 
> ummm whats the difference between floating and normal? apps like gnome
> calculator just open ... normal windows.

Gnome calculator doesn't like being tiled: https://sr.ht/Ai5N.png

There are probably some other applications that would very much like to
be shown at a particular aspect ratio or resolution.

> xdg shell should be handling these already - except dmenu. dmenu is almost a
> special desktop component. like a shelf/panel/bar thing.

dmenu isn't the only one, though, that may want to arrange itself in
special ways. Lemonbar and rofi also come to mind.

> > [input is] something that many of Sway's users are asking for.
> 
> they are going to have to deal with this then. already gnome and kde and e will
> all configure mouse accel/left/right mouse on their own based on settings. yes
> - i can RUN xset and set it back later but its FIGHTING with your DE. waqyland
> is the same. use the desktop tools for this :) yes - it'll change between
> compositors.  :) at least in wayland you cant fight with the compositor here.
> for sway - you are going ot have to write this yourself. eg - write tools that
> talk to sway or sway reads a cfg file you edit or whatever. :)

I've already written this into sway, fwiw, in your config file. I think
this is fine, too, and I intend to keep supporting configuring outputs
like that. But consider the use case of Krita, or video games like Osu!

> > However, beyond detailed input device configuration, there are some
> > other things that we should consider. Some applications (games, vnc,
> > etc) will want to capture the mouse and there should be a protocol for
> > them to indicate this with (perhaps again associated with special
> > permissions). Some applications (like Krita) may want to do things like
> > take control of your entire drawing tablet.
> 
> as i said. can of worms. :)

It's a can of worms we should deal with, and one that I don't think it's
hard to deal with. libinput lets you configure a handful of details
about input devices. Let's expose these things in a protocol.

> you have no idea how many non-security-sensitive things need fixing first
> before addressing the can-of-worms problems. hell nvidia just released drivers
> that requrie compositors to re-do how they talk to egl/kms/drm to work that's
> not compatible with existing drm dmabuf buffers etc. etc.

Why do those things need to be dealt with first? Sway is at a good spot
where I can start thinking about these sorts of things. There are
enough people involved to work on multiple things at once. Plus,
everyone thinks nvidia's design is bad and we're hopefully going to see
something from them that avoids vendor-specific code.

I don't see these problems as a can of worms. I see them as problems
that are solvable and necessary to solve, and now is a good time to
solve them. My compositor is coming up on version 1.0. Supporting the
APIs is the driver's problem, we've described the spec and as soon as
they implement it, it will Just Work(tm).

> even clients and decorations. tiled wm's will not want clients to add
> decorations with shadows etc. - currently clients will do csd because csd is
> what weston chose and gnome has followed and enlightenment too. kde do not want
> to do csd. i think that's wrong.

What is a can of worms is the argument over whether or not we should use
CSD or SSD. I fall in the latter camp, but I don't think we need to
fight over it now. We should be able to agree that a protocol for
negotiating whether or not borders are drawn would be reasonable. Is it
a GTK app that does nothing interesting with its titlebar? Well, if the
compositor wants to draw its borders, then let it do so. Does it do
fancy GTK stuff with the borders? Well, no, mister compositor, I want to
do fancy things. Easy enough.

> it adds complexity to wayland just to "not follow the convention". but
> for tiling i see the point of at least removing the shadows. clients
> may choose to slap a title bar there still because it's useful
> displaying state. but advertising this info from the compositor is not
> standardized. what do you advertise to clients? where/when? at connect
> time? at surface creation time? what negotiation is it? it easily
> could be that 1 screen or desktop is tiled and another is not and you
> dont know what to tell the client until it has created a surface and
> you know where that surface would go. perhaps this might be part of a
> larger set of negotiation like "i am a mobile app so please stick me
> on the mobile screen" or "i'm a desktop app - desktop please" then
> with the compositor saying where it decided to allocate you (no mobile
> screen available - you are on desktop) and app is expected to adapt...  

In Wayland you create a surface, then assign it a role. Extra details
can go in between, or go in the call that gives it a role. Right now
most applications are creating their surface and then making it a shell
surface. The compositor can negotiate based on its own internal state
over whether a given output is tiled or not, or in cases like AwesomeWM,
whether a given workspace is tiled or not. And I don't think the
decision has to be final. If the window is moved to another output or
really if any of the circumstances change, they can renegotiate and the
surface can start drawing its own decorations.

> there's SIMPLE stuff like - what happens when compositor crashes? how do we
> handle this? do you really want to lose all your apps when compositors crash?
> what should clients do? how do we ensure clients are restored to the same place
> and state? crash recovery is important because it is always what allows
> updates/upgrades without losing everything. THIS stuff is still "un solved".
> i'm totally not concerned about screen casting or vnc etc. etc. until all of
> these other nigglies are well solved first.

I'm still not on board with all of this "first" stuff. I don't see any
reason why we have to order ourselves like this. It all needs to get
done at some point. Right now we haven't standardized anything, and each
compositor is using its own unique, incompatible way of taking
screenshots and recording videos, and each is probably introducing some
kind of security problem.

> apps can show their own content for their own bug reporting. for system-wide
> reporting this will be DE integrated anyway. supporting video capture is a a
> can of worms. as i said - single buffer? multiple with metadata? who does
> conversion/scaling/transforms? what is the security model? and as i said - this
> has major implications to the rendering back-end of a compositor.

The compositor hands RGBA (or ARGB, whatever, I don't care, we just pick
one) data to the client that's recording. This problem doesn't have to
be complicated. As for the "major implications"...

> there's a difference. when its an internal detail is can be changed and
> adapted to how the compositor and its rendering subsystem work. when its a
> protocol you HAVE to support THAT protocol and the way THAT protocol defines
> things to work or apps break.

You STILL have to get the pixels into the encoder on the compositor
side. You will ALWAYS have to do that if you want to support video
captures, regardless of who's doing it. At some point you're going to
have to get the pixels you're rendering and hand them off to someone, be
that libavcodec or a privledged client.

> > We can make Wayland support use-cases that are important to our users or
> > we can watch them stay on xorg perpetually and end up maintaining two
> > graphical stacks forever.
> 
> priorities. there are other issues that should be solved first before worrying
> about the pandoras box ones.

These are not pandora's box. These are small, necessary features.

--
Drew DeVault