Collaboration on standard Wayland protocol extensions

Carsten Haitzler (The Rasterman) raster at
Tue Mar 29 02:31:01 UTC 2016

On Mon, 28 Mar 2016 10:55:05 -0400 Drew DeVault <sir at> said:

> On 2016-03-28 11:03 PM, Carsten Haitzler wrote:
> > should we? is it right to create yet another rsecurity model in userspace
> > "quickly" just to solve things that dont NEED solving at least at this
> > point.
> I don't think that the protocol proposed in other branches of this
> thread is complex or short sighted. Can you hop on that branch and
> provide feedback?

my take on it is that it's premature and not needed at this point. in fact i
wouldn't implement a protocol at all. *IF* i were to allow special access, i'd
simply require to fork the process directly from compositor and provide a
socketpair fd to this process and THAT fd could have extra capabilities
attached to the wl protocol. i would do nothing else because as a compositor i
cannot be sure what i am executing. i'd hand over the choice of being able to
execute this tool to the user to say ok to and not just blindly execute
anything i like.

> > adding watermarks can be done after encoding as another pass (encode in high
> > quality). hell watermarks can just be a WINDOW (surface) on the screen. you
> > don't need options. ass for audio - not too hard to do along with it. just
> > offer to record an input device - and choose (input can be current mixed
> > output or a mic ... or both).
> You're still not grasping the scope of this. I want you to run this
> command right now:
> man ffmpeg-all
> Just read it for a while. You're delusional if you think you can
> feasibly implement all of these features in the compositor. Do you

all a compositor has to do is be able to capture a video stream to a file. you
can ADD watermarking, sepia, and other effects later on in a video editor. next
you'll tell me gimp is incapable of editing image files so we need programmatic
access to a digital cameras ccd to implement effects/watermarking etc. on

> honestly want your screen capture tool to be able to add a watermark?

no - this can be done in a video editing tool later on. just record video at
high quality so degradation is not an issue.

> How about live streaming, some people add a sort of extra UI to read off
> donations and such. The scope of your screen capture tool is increasing
> at an alarming rate if you intend to support all of the features

no. i actually did not increase the scope. i kept it simple to "compositor can
write a file". everything else can be done in a post-processing task. that file
may include captured audio at the same time from a specific audio input.

> currently possible with ffmpeg. How about instead we make a simple
> wayland protocol extension that we can integrate with ffmpeg and OBS and
> imagemagick and so on in a single C file.

i'm repeating myself. there are bigger fish to fry.

> > exactly what you describe is how e works out of the box. no sscripts needed.
> > requiring people write script to do their screen configuration is just
> > wrong. taking the position of "well i give up and won't bother and will
> > just make my users write scripts instead" iss sticking your head in the
> > sand and not solving the problem. you are now asking everyone ELSE who
> > writes a compositor to implement a protocol because YOU wont solve a
> > problem that others have solved in a user friendly manner.
> What if I want my laptop display to remain usable? Right now I'm docked

eh? ummm that is what happens - unless you close the lid, then internal display
is "disconnected".

> somewhere else and I actually do have this scenario - my laptop is one
> of my working displays. How would I configure the difference between
> these situations in your tool? What if I'm on a laptop with poorly
> supported hardware (I've seen this before) where there's a limit on how
> many outputs I can use at once? What if I want to write a script where I
> put on a movie and it disables every output but my TV automatically? The
> user is losing a lot of power here and there's no way you can satisfy
> everyone's needs unless you make it programmable.

not true. this can be encapsulated without it being programmable. i have yet to
find a laptop that cannot run all its outputs, but the general limitation can
be accounted for - eg via prioritization. if you have 4 outputs and only 3 can
work at a time - then chose the 3 with the highest priority - adjust priority
of screens to have what you want.

> > > Base your desktop's tools on the common protocol, of course. Gnome
> > > settings, KDE settings, arandr, xrandr, nvidia-settings, and so on, all
> > > seem to work fine configuring your outputs with the same protocol today.
> > > Yes, the protocol is meh and the implementation is a mess, but the
> > > clients of that protocol aren't bad by any stretch of the imagination.
> > 
> > no tools. why do it? it's built in. in order for screen config "magic" to
> > work  set of metadata  attached to screens. you can set priority (screens
> > get numbers from highest to lowest priority at any given time allowing
> > behaviour like your "primary" screen to migrate to an external one then
> > migrate back when external monitor is attached etc.) sure we can start
> > having that metadata separate but then ALTERNATE TOOLS won't be able to
> > configure it thus breaking the desktop environment not providing metadata
> > and other settings associated with a display. this breaks functionality for
> > users who then complain about things not working right AND then the
> > compositor has to now deal with these "error cases" too because a foreign
> > tool will be messing with its data/setup.
> Your example has a pretty straightforward baseline - the "default"
> profile. Even so, we can design the protocol to make the custom metadata
> options visible to the tools, and the tools can then provide the user
> with options to configure that as well.

a protocol with undefined metadata is not a good protocol. it's now goes blobs
of data that are opaque except to specific implementations., this will mean
that other implementations eventually will do things like strip it out or damage
it as they don't know what it is nor do they care.

> > as above. i have seen screen configuration used and abused over the years
> > where i just do not want to have a protocol for messing around with it for
> > any client. give them an inch and they'll take a mile.
> Let them take a mile. _I_ want a mile. Here's an old quote that I think
> is always relevant:
> UNIX was not designed to stop its users from doing stupid things, as
> that would also stop them from doing clever things.

but it isn't the user - it's some game you download that you cannot alter the
code or behaviour of that then messes everything up because its creator only
ever had a single monitor and didn't account for those with 2 or 3.

> > and that's perfectly fine - that is your choice. do not force your choice on
> > other compositors. you can implement all the protocol you want in any way
> > you want for your wm's tools.
> Why do we have to be disjointed? We have a common set of problems and we
> should strive for a common set of solutions.

because things like output configuration i do not see as needing a common
protocol. in fact it's desirable to not have one at all so it cannot be abused
or cause trouble.

> > gnome does almost everything with dbus. they love dbus. a lot of gnome is
> > centred around dbus. they likely will choose dbus to do this. likely. i
> > personally wouldn't choose to use dbus.
> Let's not speak for Gnome. They're copied on this thread, they'll speak
> for themselves.

my point is that not everyone chooses the same solution as you. not everyone
has the same problem and needs to solve it or WANTS to solve it the same way.

> > > primary display? What about applications that use the entire output for
> > 
> > the app can simply not request to present on their "presentation" screen...
> > or the user would mark their primary screen (internal on laptop maybe) AS
> > their presentation screen - more metadata to be held by compositor.
> Then we're back to the very thing you were criticising before - making
> the applications implement some sort of switch between using a
> "presentation" output and using some other kind of output. It would be a
> lot less complicated if the application asked to go full screen and the
> compositor said "hey, this app wants to be full screen, which output
> would you like to put it on?"

that needs ZERO protocol extending. there already is a fullscreen request in
xdg shell. this is a compositor implementation detail. if all you want to do is
ask the user where to place the fullscreen window. if you want to open multiple
windows and have them on the most appropriate screen by default without asking
the user, then you need a little metadata. asking the app to explicitly define
the output simply means you now have N possible ways this could work depending
on each and every app. leave it to the compositor to decide along with hints
that tell the compositor the likely usage purpose of the window. a user can
always move it somewhere else via the compositor (hotkey, alt+left mouse drag
to somewhere else or some other mechanism).

but we are talking things like output control/configuration - why does a
presentation app need this control? control the actual setup of the output or
even explicitly define exactly what output (by name, id, number, etc.) to go
for? why does an app need to be able to target a specific output
programatically rather than simply give the intent/purpose of the

> > now ALL presentation tools behave the same -  you dont have to reconfigure
> > each one separately and deal with the difference and lack or otherwise of
> > features. it's done in 1 place - compositor, and then all apps that want to
> > do a similar thing follow and work "as expected". far better than just
> > ignoring the issue. you yourself already talked about extra
> > tags/hints/whatever - this is one of those.
> I think I'm getting at something here. Does the workflow I just
> described satisfy everyone's needs for this?
> > because this require clients DEFINING screen layout. wayland was
> > specifically designed to HIDE THIS. if the compositor displayed a screen
> > wrapped around a sphere in real life in a room - then it doesn't have
> > rectangles... how will an app deal with that? what if the compositor is
> > literally a VR world with surfaces wrapped around spheres and cubes - the
> > point of wayland's design was to hide this info from clients completely so
> > the compositor decides based on environment, not each and every client.
> > this was a basic premise/design in wayland from the get go and it was a
> > good one. letting apps break this abstraction breaks this design.
> In practice the VAST majority of our users are going to be using one or
> more rectangular displays. We shouldn't cripple what they can do for the
> sake of the niche. We can support both - why do we have to hide
> information about the type of outputs in use from the clients? It
> doesn't make sense for an app to get fullscreened in a virtual reality
> compositor, yet we still support that. Rather than shoehorning every
> design to meet the least common denominator, we should be flexible.

they are not crippled. that's the point. in virtual reality fullscreen makes
sense as a "take over thew world", not take over the output to one eye.for
monitors on a desktop it makes sense to take over that monitor but not others.
so it depends on context and the compositors job is to interpret/manage/deal
with that context.

> > > No. Applications want to be full screen or they don't want to be. If
> > > they want to pick a particular output, we can easily let them do so.
> > 
> > i don't know about you.. but fullscreen to enlightenment means you use up
> > ONE SCREEN. [snip]
> I never said that fullscreen means multiple screens. No clue where
> that's coming from.

then why does this presentation tool need to be able to configure outputs - eg
define which screen views which part of their window spanning all outputs? i
see no other purpose of having configuration control of outputs for a
presentation tool.

> > what makes sense is an app hints at the purpose of its window and opens n
> > windows (surfaces). it can ask for fullscreen for each. the hints would
> > allow the compositor to choose which screen the window/surface is assigned
> > to.
> Hinting doesn't and cannot capture all of the use cases. Just letting
> the client say what it wants does.

clients explicitly saying what they want leads to broken scenarios. the game
dev who has never had > 1 screen and thus messes up users multi screen setups
because they never knew of nor cared about this situation. a HINT allows
interpretation to adapt the scenario nicely and make things work "properly".

the "i'd like to be fullscreen" hint from xdg has been a godsend - it doesn't
allow for clients to go "well i want to be at 50,80 and at 1278x968" (though
other bits of x do). apps used to do things like query root window size, create
override-redirect window , grab kbd and mouse and then display ... even though
root window may span many monitors and some parts of the rot window geom may
not be visible as no screen views that because the guy didn't know about randr
and such. worse they would play with xvidtune that only did 1 screen and thus
mess up all your screen config... because a protocol was invented that allows
EXPLICIT control and x HAD to implement explicit control. the fullscreen netwm
hint has drastically improved things as a high level hint allowing the wm to
interpret fullscreen in a way that makes sense given the scenario.

by the same token anything we do in wayland should be done at this higher level
hinting level. anything else is a recipe for disaster. it's not learning the
lessons of the past.

> > > Gnome calculator doesn't like being tiled:
> > 
> > i think the problem is you are not handling min/max sizing of clients
> > properly. :) you need to fix sway. gnome calculator is not sizing up its
> > buffer on surface size. that is a message "i can't be bigger than this -
> > this is my biggest size. deal with is". you need to deal with it. eg - pad
> > it and make it sized AT the buffer size :)
> This is harmful to tiling window managers in general. The window manager
> arranges the windows, not the other way around. You can't have tiling

sorry. neither in x11 nor in wayland does a wm/compositor just have the freedom
to resize a window to any size it likes WITHOUT CONSEQUENCES. in x11 min/max
size hints tell the wm the range of sizes a window can be sensibly drawn/laid
out with. in wayland it's communicated by buffer size. if you choose to ignore
this then you get to deal with the consequences as in your screenshot.

i would not just blindly ignore such info. i'd either pad with black/background
and keep to the buffer size or at least scale while retaining aspect ratio (and
pad as needed but likely less).

interestingly now you complain about clients having EXPLICIT control and you
say "oh well no ... this is bad for tiling wm's" ... yet when i explain that
having output configuration control etc. etc. is harmful it's something that
SHOULD be allowed for clients... (and where the output isn't even a client
resource unlike the buffers that they render which is one).

> window management if you can't have the compositor tell the clients what
> size to be. There's currently no metadata to tell the compositor that a
> surface is strict about its geometry. Most applications handle being
> given a size quite well and will rearrange/rerender itself to
> compensate. Things like gnome-calcualtor are the exception, not the
> rule.

yes there is - the buffer size of the next frame. your surface size is a
"request" to client for that size. the response will be a new buffer or some
given size (or maybe no new buffer at all). you THEN deal with this new size. :)

> > > > xdg shell should be handling these already - except dmenu. dmenu is
> > > > almost a special desktop component. like a shelf/panel/bar thing.
> > > 
> > > dmenu isn't the only one, though, that may want to arrange itself in
> > > special ways. Lemonbar and rofi also come to mind.
> > 
> > all of these basically are "desktop components" ala
> > taskbars/shelves/panels/whatever - i know that for e we don't want to
> > support such apps. these are built in. i don't know what gnome or kde think
> > but these go against their design as an integrated desktop environment. YOU
> > need these because your compositor has no such feature itself. the bigger
> > desktops don't need it. they MAY support it - may not. i know i don't want
> > to. :)
> Users should be free to choose the tools they want. dmenu is much more
> flexible and scriptable than anything any of the DEs offer in its place

that is your wm's design. that is not the design of others. they want something
integrated and don't want external tools.

> - you just pipe in a list of things and the user picks one. Don't be
> fooled into thinking that whatever your DE does for a given feature is
> the mecca of that feature. Like you were saying to make other points -

no - but i'm saying that this is not a COMMON feature among all DEs. different
ones will work differently. gnome 3's chosen design these days is to put it
into gnome shell via js extensions, not the gnome 2 way with a separate panel
process (ala dmenu). enlightenment does it internally too and extend
differently. my point is that what you want here is not universal.

> there are fewer contributors to each DE than you might imagine. DEs are

that is exactly what i said in response to you saying that "we have all the
resources to do all of this" when i said we don't... :/ we don't - resources
are already expended elsewhere.

> spread too thin to make the perfect _everything_. But some projects like
> dmenu are small and singular in their focus, and maintained by one or
> two people who put in a much larger amount of effort than is put in by
> DE contributors on the corresponding features of that DE.
> Be flexible enough for users to pick the tools they want.

a lifetime of doing wm's has taught me that this approach is not the best. you
end up with a limiting and complex protocol to then allow taskbars, pagers and
so on to be in "dmenus" of this world. this is how gnome 1.x and 2.x worked. i
added the support in e long ago. i learned that it was a limiter in adding
features as you had to conform to someone elses idea of what virtual desktops
are etc.

these panels/taskbars/shelves/whatever are best being closely integrated into
the wm.

YOU choose not to integrate. the other major DEs come already integrated with
these. this is not a universal solution everyone should support. you can come
up with your own extension and encourage people to support it in their demnu's
etc. - if another DE wants to support this then they can implement the same

> > i don't know osu - but i see no reason krita needs to configure a tablet. it
> > can just deal with input from it. :)
> >
> > input is very sensitive. having done this for years and watched how games
> > like to turn off key repeat then leave it off when they crash... or change
> > mouse accel then you find its changed everywhere and have to "fix it" etc.
> > etc. - i'd be loathe to do this. give them TOO much config ability anbd it
> > can become a security issue.
> Let's change the tone of the input configuration discussion. I've come
> around to your points about providing input configuration in general to
> clients, let's not do that. I think the only issue we should worry about
> for input at this point is fixing the pointer-constraints protocol to
> use our new permissions model.

that's very reasonable. :)

> > > Why do those things need to be dealt with first? Sway is at a good spot
> > > where I can start thinking about these sorts of things. There are
> > > enough people involved to work on multiple things at once. Plus,
> > > everyone thinks nvidia's design is bad and we're hopefully going to see
> > > something from them that avoids vendor-specific code.
> > 
> > because these imho are far more important. you might be surprised at how few
> > people are involved.
> These features have to get done at some point. Backlog your
> implementation of these protocols if you can't work on it now.

that's what i'm saying. :)

> > not so simple. with more of the ui of an app being moved INTO the border
> > (titlebar etc.) this is not a simple thing to just turn it off. you then
> > turn OFF necessary parts of the ui or have to push the problem out to the
> > app to "fallback".
> You misunderstand me. I'm not suggesting that these apps be crippled.
> I'm suggesting that, during the negotiation, they _object_ to having the
> server draw their decorations. Then other apps that don't care can say
> so.

aaah ok. so compositor adapts. then likely i would express this as a "minimize
your decorations" protocol from compositor to client, client to compositor then
responds similarly like "minimize your decorations" and compositor MAY choose
to not draw a shadow/titlebar etc. (or client responds with "ok" and then
compositor can draw all it likes around the app).

> > only having CSD solves all that complexity and is more efficient
> > than SSD when it comes to things like assigning hw layers or avoiding
> > copies of vast amounts of pixels. i was against CSD to start with too but i
> > see their major benefits.
> I don't want to rehash this old argument here. There's two sides to this
> coin. I think everyone fully understands the other position. It's not
> hard to reach a compromise on this.

it's sad that we have to have this disagreement at all. :) go on. join the dark
side! :) we have cookies!

> > > In Wayland you create a surface, then assign it a role. Extra details
> > > can go in between, or go in the call that gives it a role. Right now
> > > most applications are creating their surface and then making it a shell
> > > surface. The compositor can negotiate based on its own internal state
> > > over whether a given output is tiled or not, or in cases like AwesomeWM,
> > > whether a given workspace is tiled or not. And I don't think the
> > > decision has to be final. If the window is moved to another output or
> > > really if any of the circumstances change, they can renegotiate and the
> > > surface can start drawing its own decorations.
> > 
> > yup. but this signalling/negotiation has to exist. currently it doesnt. :)
> We'll make this part of the protocols we're working on here :)

this i can agree on. :)

> > you aren't going to talk me into implementing something that is important
> > for you and not a priority for e until such a time as i'm satisfied that
> > the other issues are solved. you are free to do what you want, but
> > standardizing things takes a looong time and a lot of experimentation,
> > discussion, and repeating this. we have resources on wayland and nothing
> > you described is a priority for them. there are far more important things
> > to do that are actual business requirements and so the people working need
> > to prioritize what is such a requirement as opposed to what is not.
> > resources are not infinite and free.
> Like I said before, put it on your backlog. I'm doing it now, and I want
> your input on it. Provide feedback now and implement later if you need
> to, but if you don't then the protocols won't meet your needs.
> > let me complicate it for you. let's say i'm playing a video fullscreen. you
> > now have to convert argb to yuv then encode when it would have been far more
> > efficient to get access directly to the yuv buffer before it was even
> > scaled to screen size... :) so you have just specified a protocol that is
> > by design inefficient when it could be more efficient.
> What, do you expect to tell libavcodec to switch pixel formats
> mid-recording? No one is recording their screen all the time. Yeah, you
> might hit performance issues. So be it. It may not be ideal but it'll
> likely be well within the limits of reason.

you'll appreciate what i'm getting at next time you have to do 4k ... or 8k
video and screencast/capture that. :) and have to do miracast... on a 1.3ghz
arm device :)

> > yes - but when, how often and via what mechanisms pixels get there is a very
> > delicate thing.
> And yet you still need to convert the entire screen to a frame and feed
> it into an encoder, no matter what. Feed the frame to a client instead.

is the screen a single frame or multiple pieced together by scanout hw
layers? :) what is your protcol/interface to the "screen stream". if you have
it be a simple "single buffer" then you are going to soon enough run into
issues. :)

> > so far we don't exactly have a lot of inter-desktop co-operation happening.
> > it's pretty much everyone for themselves except for a smallish core
> > protocol.
> Which is ridiculous.
> > do NOT try and solve security sensitive AND performance sensitive AND design
> > limiting/dictating things first and definitely don't do it without everyone
> > on the same page.
> I'm here to get everyone on the same page. Get on it.

let's work on the things we do have in common first. :)

------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at

More information about the wayland-devel mailing list