Collaboration on standard Wayland protocol extensions

Carsten Haitzler (The Rasterman) raster at rasterman.com
Wed Mar 30 04:35:12 UTC 2016


On Tue, 29 Mar 2016 08:11:03 -0400 Drew DeVault <sir at cmpwn.com> said:

> > what is allowed. eg - a whitelist of binary paths. i see this as a lesser
> > chance of a hole.
> 
> I see what you're getting at now. We can get the pid of a wayland
> client, though, and from that we can look at /proc/cmdline, from which
> we can get the binary path. We can even look at /proc/exe and produce a
> checksum of it, so that programs become untrusted as soon as they
> change.

you can do that... but there are race conditions. a pid can be recycled.
imagine some client just before it exits sends some protocol to request doing
something "restricted". maybe you even check on connect, but let's say this
child exits and you haven't gotten the disconnect on on the fd yet because there
is still data to read in the buffer. you get the pid while the process is
still there, then it happens to exit.. NOW you check /proc/PID ... but in
the mean time the PID was recycled with a new process that is "whitelisted"
so you check this new replacement /proc/PID/exe and find it's ok and ok the
request from the old dying client... BOOM. hole.

it'd be better to use something like smack labels - but this is not used
commonly in linux. you can check the smack label on the connection and auth by
that as smack label then can be in a db of "these guys are ok if they have
smack label "x"" and there is no race here. smack labels are like containers
and also affect all sorts of other access like to files, network etc.

but the generic solution without relying on smack would be to launch yourself -
socketpair + pass fd. :) it has the lowest chance of badness. this works if the
client is a regular native binary (c/c++) or if its a script because the fd
will happily pass on even if its a wrapper shell script that then runs a binary.

> > i know - but for just capturing screencasts, adding watermarks etc. - all
> > you need is to store a stream - the rest can be post-processed.
> 
> Correct, if you record to a file, you can deal with it in post. But
> there are other concerns, like what output format you'd like to use and
> what encoding quality you want to use to consider factors like disk
> space, cpu usage, etc. And there still is the live streaming use-case,
> which we should support and which your solution does not address.

given high enough quality any post process can also transcode to another
format/codec/quality level while adding watermarks etc. a compositor able to
stream out (to a file or whatever) video would of course have options for
basics like quality/bitrate etc. - the codec libraries will want this info
anyway...

> > let's talk about the actual apps surfaces and where they go - not
> > configuration of outputs. :)
> 
> No, I mean, that's what I'm getting at. I don't want to talk about that
> because it doesn't make sense outside of e. On Sway, the user is putting
> their windows (fullscreen or otherwise) on whatever output they want
> themselves. There aren't output roles. Outputs are just outputs and I
> intend to keep it that way.

enlightenment ALSO puts windows "on the current screen" by default and you can
move them to another screen, desktop etc. as you like. hell it has the ability
to remember screen, desktop, geometry, and all sorts of other state and
re-apply it to the same window when it appears again. i use this often myself
to force apps to do what i want when they keep messing up.. i'm not talking
about manually moving things or the ability for a compositor/wm to override and
enforce its will.

i am talking about situations where you want things to "just work" out of the
box as they might be intended to without forcing the user to go manually say
"hey no - i want this". i'm talking about a situation like
powerpoint/impress/whatever where when i give a presentation on ONE screen i
have a smaller version of the slide, i also have the preview of the next slide,
a count-down timer for the slide talk, etc. and on the "presentation screen" i
get the actual full presentation. I should not have to "manually configure
this". impress/ppts/whatever should be able to open up 2 windows and
appropriately tag them for their purposes and the compositor then KNOWS which
screen they should go onto.

impress etc. also need to know that a presentation screen exists so it knows to
open up a special "presentation window" and a "control window" vs just a
presentation window. these windows are of course fullscreen ones - i think we
don't disagree there.

the same might go for games - imagine a nintento DS setup. game has a control
window (on the bottom screen) and a "game window" on the top. similar to
impress presentation vs control windows. imagine a laptop with 2 screens. one
in the normal place and one where your keyboard would be... similar to the DS.
maybe we can talk flight simulators which may want to span 3 monitors
(left/middle/right), due to different screens able to do different refresh
rates etc. you really likely want to have 3 windows (surfaces) with each
fullscreen on each monitor. how do we advertise to games that such a setup
exists and how would they request to lay out their left/middle/right windows
correctly.

what about when i have a phone plugged into a dock. it has 2 external hdmi "big
screens" and an internal phone screen. the internal should really behave in a
mobile-way where externals would be desktop-like. maybe an app (like
libreoffice) is not usable on a tiny screen. it should be able to say "my
window is only useful in desktop mode" or something. so when i run it - it
turns up on the appropriate screen. when the dialler app that handles phone
calls gets an incoming call.. when it opens its window you likely want it ON
the mobile display, not desktop... etc.

i am just going on to give examples of how window metadata might be used to
have things go to the right place out of the box. if your wm/compositor allows
you to manually override then sure - it can say no and place the window where
it wants. it may HAVE to at times.

> > or just have the compositor "work" without needing scripts and users to
> > have to learn how to write them. :)
> 
> Never gonna happen, man. There's no way you can foresee and code for
> everyone's needs. I'm catching on to this point you're heading towards,
> though: e doesn't intend to suit everyone's needs.

just improve the compositor then. that's what software development is about.

> > > Here's the wayland screenshot again for comparison:
> > > 
> > > https://sr.ht/Ai5N.png
> > > 
> > > Most apps are fine with being told what resolution to be, and they
> > > _need_ to be fine with this for the sake of my sanity. But I understand
> > > that several applications have special concerns that would prevent this
> > 
> > but for THEIR sanity, they are not fine with it. :)
> 
> Nearly all toolkits are entirely fine with being any size, at least
> above some sane minimum. A GUI that cannot deal with being a
> user-specified size is a poorly written GUI.

it has nothing to do with the toolkit but with the app's window content. a
toolkit may be rendering/arranging it but the app has given you information
that the content is not useful below some size or above some size. if you want
to ignore this - then fine,  but don't complain of the consequences and think
the solution is a floating hint. it is not. it's your bug in not respecting
these limitations a client has given you. :) it is your choice. :)

> > no. this has nothing to do with floating. this has to do with minimum and in
> > this case especially - maximum sizes. it has NOTHING to do with floating.
> > you are conflating sizing with floating because floating is how YOU HAPPEN
> > to want to deal with it.
> 
> Fair. Floating is how I would deal with it. But maybe I'm missing
> something: where does the min/max size hints come from? All I seem to
> know of is the surface geometry request, which isn't a hint so much as
> it's something every single app does. If I didn't ignore it, all windows
> would be fucky and the tiling layout wouldn't work at all. Is there some
> other hint coming from somewhere I'm not aware of?

in x11 there are explicit min/max hints . not so in wayland - not that i saw
last time i looked. what is done is they may request surface geom. you may
respond by setting the surface to that geometry or some other. the app now
responds with a BUFFER rendered at NxM pixels. it may NOt match the geom you
set. this is basically the app disagreeing on your choice of geometry and
refusing to provide the geometry you asked for. this is the app giving you a
limit - you went beyond it and this buffer size is what the app can do.

it MAY be useful for apps to provide such hints though in xdg shell. it means a
compositor knows AHEAD of time what these limits are before it hits one. x11
also supported aspect ratio hints too. this was a bit tricky to get right (also
base size and size stepping - eg for terminals). some of this may be good to
bring to wayland, some not.

> > you COULD deal with it as i described - pad out the area or
> > scale retaining aspect ratio - allow user to configure the response. if i
> > had a small calculator on the left and something that can size up on the
> > right i would EXPECt a tiling wm to be smart and do:
> > 
> > +---+------------+
> > |   |............|
> > |:::|............|
> > |:::|............|
> > |:::|............|
> > |   |............|
> > +---+------------+
> 
> Eh, this might be fine for a small number of windows, and maybe even is
> the right answer for Sway. I'm worried about it happening for most
> windows and I don't want to encourage people to make their applications
> locked into one aspect ratio and unfriendly to tiling users.

MOST windows will have a minimum size, SOME will have a maximum size. that's
reality of things normally. often non-resizable dialog windows will have min
and max set to the same. i wouldn't worry about this as it is out of your
control- clients will decide. most will be resizable up and down to make you
happy. some will not. if you can deal nicely with "some" then your problems
will be solved.

and floating is another matter entirely. :)

> > they can patch their compositors if they want. if you are forcing users to
> > write scripts you are already forcing them to "learn to code" in a simple
> > way. would it not be best to try and make things work without needing
> > scripts/custom code per user and have features/modes/logic that "just
> > work" ?
> 
> There's a huge difference between the skillset necessary to patch a
> Wayland compositor to support scriptable output configuration and to
> write a bash script that uses a tool the compositor shipped for this
> purpose.

sure but 99% of users can't even manage a script. the 1% left can do scripting.
yes indeed 0.001% could patch the code. but the 99% are still out of luck
unless the compowitor itself does things "nicely" and provides nice little
"checkboxes and sliders" in a gui to set it up (even that is scary for 90% of
people). be aware when i am saying people - i mean general population, not linux
geeks/nerds.

> > *I* do not want adhoc panels/taskbars/tools written by separate projects
> > within my DE because they cause more problems than they solve. been there.
> > done that. not going back. i learned my lesson on that years ago. for them
> > to work you have pagers and taskbars in them to be fully functional and
> > unless you ALSO then bind all this metadata for the pagers, virtual
> > desktops and their content to a protocol that is also universal, then its
> > rather pointless. this then ties your desktop to a specific design of how
> > desktops are (eg NxM grids and only ONE of those in an entire environment.
> > when with enlightenment each screen has an independent NxM grid PER SCREEN
> > that can be switched separately.
> 
> Again, the scope of this is not increasing ad hominum. I never brought
> virtual desktops and pagers into the mix. There is a small number of
> things that are clearly the compositor's responsibility and that small
> list is the only things I want to manipulate with a protocol. Handling
> screen capture hardly has room for innovation - there are pixels on
> screen, they need to be given to ffmpeg et al. This isn't locking you
> into some particular user-facing design choice in your DE.

the point of these dmenus/panels is to contain such controls - it happens that
dmenu does not do this but most instances do. the intent of these is to act as
non-integrated parts of a desktop. they function as a desktop component - eg ar
always there from login.

> > > I'm not suggesting anything radical to try and cover all of these use
> > > cases at once. Sway has a protocol that lets a surface indicate it wants
> > > to be docked somewhere, which allows for custom taskbars and things like
> > > dmenu and so on to exist pretty easily, and this protocol is how swaybar
> > > happens to be implemented. This doesn't seem very radical to me, it
> > > doesn't enforce anything on how each of the DEs choose to implement
> > > their this and that.
> > 
> > then keep your protocol. :) i know i have no interest in supporting it - as
> > above. :)
> 
> Well, so be it.
> 
> > > We've both used this same argument from each side multiple times, it's
> > > getting kind of old. But I think these statements hold true:
> > > 
> > > There aren't necessarily enough people to work on the features I'm
> > > proposing right now. I don't think anyone needs to implement this _right
> > > now_. There also aren't ever enough people to give every little feature
> > > of their DE the attention that leads to software that is as high quality
> > > as a similar project with a single focus on that one feature.
> > 
> > that is true. :)
> 
> Interesting that this immediately follows up the last paragraph. If you
> acknowledge that your implementation of desktop feature #27 can't
> possibly be as flexible/configurable/usable/good as some project that's
> entirely focused on just making that one feature great, then why would
> you refuse to implement the required extensibility for your users to
> bring the best tools available into your environment?

because i have implemented extensibility many times over in the past 20 years.
i've come to the conclusion that they create a poor user experience with
loosely integrated components that either look ugly, don't work like the rest of
the de or do horrible hacks that then create trouble. what does work well is
tight integration. the manpower we have we have i'd RATHER devote to making
things better out of the box and having features than just saying "bah - we
give up and hope someone else will do it". every time i have done this, it has
lead to sub-optimal or poor results. you give up solving a problem and instead
then rely on 3rd party tools that don't look right, or function well, or
integrate or then don't support things YOU want to do later on (eg like the
per-screen profiles in screen output config).

maybe YOU want to do it that way - fine. that's your choice, but most other
DE's are integrated. They work on/provide their own tools and code and logic. :)

> > i disagree. i can't take linux and just use some bsd device drvier with it
> > - oh dear. that's against the spirit free software! i have to port it and
> > integrate it (as a kernel module). wayland is about making the things that
> > HAVE to be shared protocol just that. the things that don't absolutely have
> > to be, we don't. you are able to patch, modify and extend your de/wm, all
> > you like - most de's provide some way to do this. gnome today uses js. e
> > uses loadable modules. i am unsure about kde. :)
> 
> Sure, but you can use firefox and vim and urxvt while your friend
> prefers termite and emacs and chromium, and your other friend uses gedit
> and gnome-terminal and surf.

big difference - "apps" vs "desktop". of course this line is a grey area. i
consider the line at shelves/panels/filemanager/settings for desktop and
system/desktop bg/wallpaper/config tools/virtual keyboards/wm+compositor those
are on the desktop side. browser, terminals, editors are firmly in "apps" land.
it may be that your de of choice provides apps that work with the
look/feel/philosophy/toolkit of your de - but they are separate. that is where
i draw the line.

> > what happens when you need to restart sway after some development? where do
> > all your terminals/editors/ide's, browsers/irc clients go? they vanish and
> > you have to re-run them?
> 
> Most of my users aren't developers working on sway all the time. Sway
> has an X backend like Weston, I use that to run nested sways for
> development so I'm not restarting Sway all the time. The compositor
> crashing without losing all of the clients is a pipe dream imo, I'm not
> going to look into it for now.

then you are relying on x to do development, you can never get rid of x11 -
ever then...

i don';t see it as a pipe dream. all you need is the ability to recognize a
client and its surfaces from a previous connection and have clients reconnect
and provide whatever information is necessary to restore that state (eg an id
of some sort).

> > > > aaah ok. so compositor adapts. then likely i would express this as a
> > > > "minimize your decorations" protocol from compositor to client, client
> > > > to compositor then responds similarly like "minimize your decorations"
> > > > and compositor MAY choose to not draw a shadow/titlebar etc. (or client
> > > > responds with "ok" and then compositor can draw all it likes around the
> > > > app).
> > > 
> > > I think Jonas is on the right track here. This sort of information could
> > > go into xdg_*. It might not need an entire protocol to itself.
> > 
> > i'd lean on a revision of xdg :)
> 
> I might lean the other way now that I've seen that KDE has developed a
> protocol for this. I think that would be a better starting point since
> it's proven and already in use. Thoughts?

if you plan on it becoming universal - plan for xdg. if you want to keep it
private or experiment locally- make it a separate protocol.

> > ... you might be surprised. 4k ones are already out there. ok . not 1.3ghz -
> > 2ghz - but no way you can capture even 4k with the highest end arms unless
> > you avoid conversion. you keep things in yuv space and drop your bandwidth
> > requirements hugely. in fact you never leave yuv space and make use of the
> > hw layers and the video decoder decodes directly into scanout buffers. you
> > MAY be able to stuff the yuv buffers back into an encoder and re-encode
> > again ... just. but it'd be better not to decode AND encode by take the
> > mp4/whatever stream directly and shuffle it down the network pipe. :)
> > 
> > believe it or not TODAY tablets with 4k screens ship. you can buy them. they
> > are required to support things like miracast (mp4/h264 stream over wifi).
> > it's reality today. products shipping in the 100,000's and millions. :)
> 
> Eh, alright. So they'll exist soon. I feel like both strategies can
> coexist, in that case. If you want to livestream your tablet, you'll
> have a performance hit and it might just be unavoidable. If you just
> want to record video, use the compositor's built in thingy. I'm okay
> with unavoidable performance concerns in niche situations - most people
> aren't going to be livestreaming from their tablet pretty much ever.
> Most people aren't even going to be screen capturing on their tablet to
> be honest. It goes back to crippling the common case for the sake of the
> niche case.

it's a performance hit for EVERYONE if you do un-needed transforms (scaling,
colorspace conversion etc.). ;)

-- 
------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster at rasterman.com



More information about the wayland-devel mailing list