[RFC v2] Wayland presentation extension (video protocol)

Fri Feb 21 00:36:37 PST 2014

On Fri, 21 Feb 2014 06:40:02 +0100
Mario Kleiner <mario.kleiner.de at gmail.com> wrote:

> On 20/02/14 12:07, Pekka Paalanen wrote:
> > Hi Mario,
> >
> 
> Ok, now i am magically subscribed. Thanks to the moderator!

Cool, I can start trimming out parts of the email. :-)

> > I have replies to your comments below, but while reading what you said,
> > I started wondering whether Wayland would be good for you after all.
> >
> > It seems that your timing sensitive experiment programs and you and
> > your developers use a great deal of effort into
> > - detecting the hardware and drivers,
> > - determining how the display server works, so that you can
> > - try to make it do exactly what you want, and
> > - detect if it still does not do exactly like you want it and bail,
> >    while also
> > - trying to make sure you get the right timing feedback from the kernel
> >    unmangled.
> >
> 
> Yes. It's "trust buf verify". If i know that the api / protocol is well 
> defined and suitable for my purpose and have verified that at least the 
> reference compositor implements the protocol correctly then i can at 
> least hope that all other compositors are also implemented correctly, so 
> stuff should work as expected. And i can verify that at least some 
> subset of compositors really works, and try to submit bug reports or 
> patches if they don't.

I don't think we can make the Wayland protocol definition strict enough
that you could just rely on other compositors implementing it the same
way as Weston. We don't really want to restrict the implementations too
much with generic protocol interfaces. Therefore I think you will need
to test and validate not only every compositor, but possibly also their
different releases.

Adding something special and optional for compositors to implement,
with very strict implementation requirements would be possible, but
with the caveat of not everyone implementing it.

> > Sounds like the display server is a huge source of problems to you, but
> > I am not quite sure how running on top a display server benefits you.
> > Your experiment programs want to be in precise control, get accurate
> > timings, and they are always fullscreen. Your users / test subjects
> > never switch away from the program while it's running, you don't need
> > windowing or multi-tasking, AFAIU, nor any of the application
> > interoperability features that are the primary features of a display
> > server.
> >
> 
> They are fullscreen and timing sensitive in probably 95% of all typical 
> application cases during actual "production use" while experiments are 
> run. But some applications need the toolkit to present in regular 
> windows and GUI thingys, a few even need compositing to combine my 
> windows with windows of other apps. Some setups run multi-display, where 
> some displays are used for fullscreen stimulus presentation to the 
> tested person, but another display may be used for control/feedback or 
> during debugging by the experimenter, in which case the regular desktop 
> GUI and UI of the scripting environment is needed on that display. One 
> popular case during debugging is having a half-transparent fullscreen 
> window for stimulus presentation, but behind that window the whole 
> regular GUI with the code editor and debugger in the background, so one 
> can set breakpoints etc. - The window is made transparent for mouse and 
> keyboard input, so users can interact with the editor.
> 
> So in most cases i need a display server running, because i sometimes 
> need compositing and i often need a fully functional GUI during at the 
> at least 50% of the work time where users are debugging and testing 
> their code and also don't want to be separated from their e-mail clients 
> and web browsers etc. during that time.

You could have your desktop session on a different VT than the
experiment program, and switch between. Or have multiple outputs, some
dedicated for the experiment, others for the desktop. The same for
input devices.

Or if your infrastructure allows, have X11, Wayland, and direct DRM/KMS
backends choosable at runtime.

But yes, it starts getting complicated.

> > Why not take the display server completely out of the equation?
> >
> > I understand that some years ago, it would probably not have been
> > feasible and X11 was the de facto interface to do any graphics.
> >
> > However, it seems you are already married to DRM/KMS so that you get
> > accurate timing feedback, so why not port your experiment programs
> > (the framework) directly on top of DRM/KMS instead of Wayland?
> >
> 
> Yes and no. DRM/KMS will be the most often used one and is the best bet 
> if i need timing control and it's the one i'm most familiar with. I also 
> want to keep the option of running on other backends if timing is not of 
> much importance, or if it can be improved on them, should the need arise.
> 
> > With Mesa EGL and GBM, you can still use hardware accelerated openGL if
> > you want to, but you will also be in explicit control of when you
> > push that rendered buffer into KMS for display. Software rendering by
> > direct pixel poking is also possible and at the end you just push that
> > buffer to KMS as usual too. You do not need any graphics card specific
> > code, it is all abstracted in the public APIs offered by Mesa and
> > libdrm, e.g. GBM. The new libinput should make hooking into input
> > devices much less painful, etc. All this thanks to Wayland, because on
> > Wayland, there is no single "the server" like the X.org X server is.
> > There will be lots of servers and each one needs the same
> > infrastructure you would need to run without a display server.
> >
> > No display server obfuscating your view to the hardware, no
> > compositing manager fiddling with your presentation, and most likely no
> > random programs hogging the GPU at random times. Would the trade-off
> > not be worth it?
> >
> 
> I thought about EGL/GBM etc. as a last resort for especially demanding 
> cases, timing-wise. But given that the good old X-Server was good enough 
> for almost everything so far, i'd expect Wayland to perform as good as 
> or better timing-wise. If that turns out to be true, it would be good 
> enough for hopefully almost all use cases, with all the benefits of 
> compositing and GUI support when needed, and not having to reimplement 
> my own display server. E.g., i'm also using GStreamer as media backend 
> (still 0.10 though) and while there is Wayland integration, i doubt 
> there will be Psychtoolbox integration. Or things like Optimus-style 
> hybrid graphics laptops with one gpu rendering, the other gpu 
> displaying. I assume Wayland will tackle this stuff at some point, 
> whereas i'm not too keen to potentially learn and tackle all the ins and 
> outs of rendernodes or dmabuf juggling.

I never meant you would need to implement a display server, just each
of the sensitive programs presenting directly to the DRM/KMS, one at a
time. The only IPC you might need is to ad hoc communicate with a
controlling application that runs on a normal display server. :-)

Of course, making Wayland at least as "good" as the X.org X server for
X11 is is preferrable, but at the same time we specifically drop some
features like clients' ability to arbitrarily hold the server as a
hostage with locks and grabs. With presentation and timings, I am trying
to avoid "urgent protocol messages" that refer to something happening
"right now", or at least to not rely on them too much.

Gstreamer probably won't have Psychtoolbox integration indeed, unless
you make it happen yourself of pay someone to do it. But I could see
something like a GBM integration, which would benefit more than just
Psychtoolbox. Or EGL! EGL is a pretty common interface, and producing
video frames as EGLImages is something I think Gstreamer might already
do, at least for some pipelines.

Optimus support is much more about the kernel drivers, DRM
infrastructure, and the user space drivers i.e. Mesa, than any display
server or window system protocol. Again, you could just take advantage
of the infrastructure built to enable Wayland (and X11!), as that is
window system agnostic.

Optimus is not for Wayland to tackle per se, Wayland only needs to be a
simple messenger. After all, Wayland is only a protocol, not an
implementation of anything.

> > Of course your GUI tools and apps could continue using a display server
> > and would probably like to be ported to be Wayland compliant, I'm just
> > suggesting this for the sensitive experiment programs. Would this be
> > possible for your infrastructure?
> >
> 
> Not impossible when absolutely needed, just inconvenient for the user + 
> yet another display backend to maintain and test for me. Psychtoolbox is 
> a set of extensions for both GNU/Octave and Mathworks Matlab - both use 
> the same scripting language, so users can choose which to use and have 
> portable code between them. Matlab uses a GUI based on Java/Awt/Swing on 
> top of X11, whereas Octave just gained a QT-based GUI beginning this 
> year. You can run both apps in a terminal or console, but most of my 
> users are mostly psychologists/neuro-biologists/physicians, and most of 
> them only have basic programming skills and are somewhat frightened by 
> command line environments. Most would touch a non-GUI environment only 
> in moments of highest despair, let alone learn how to switch between a 
> GUI environment and a console.

To be quite honest, I am surprised you have managed to get the toolbox
working so well in such an environment. It must have been really hard,
given your timing requirements. I have worked on Matlab myself a few
years, but it was always "if you need to be fast and predictable, use
something else". (Yeah, I know you can write MEX etc. stuff.)

I think I understand. The trade-off would not be a net gain for you at
the moment: you prefer ease of installation and use over the more
rarely needed guaranteed performance.

> >
> > On Thu, 20 Feb 2014 04:56:02 +0100
> > Mario Kleiner <mario.kleiner.de at gmail.com> wrote:
> >
> >> On 17/02/14 14:12, Pekka Paalanen wrote:
> >>> On Mon, 17 Feb 2014 01:25:07 +0100
> >>> Mario Kleiner <mario.kleiner.de at gmail.com> wrote:
> >>>
...
> >>>> 1. Wrt. an additional "preroll_feedback" request
> >>>> <http://lists.freedesktop.org/archives/wayland-devel/2014-January/013014.html>,
> >>>> essentially the equivalent of glXGetSyncValuesOML(), that would be very
> >>>> valuable to us.
> >>>>
> >> ...
> >>>
> >>> Indeed, the "preroll_feedback" request was modeled to match
> >>> glXGetSyncValuesOML.
> >>>
> >>> Do you need to be able to call GetSyncValues at any time and have it
> >>> return ASAP? Do you call it continuously, and even between frames?
> >>>
> >>
> >> Yes, at any time, even between frames, with a return asap. This is
> >> driven by the specific needs of user code, e.g., to poll for a vblank,
> >> or to establish a baseline of current (msc, ust) for timing stuff
> >> relative. Psychtoolbox is an extension to a scripting language, so
> >> usercode often decides how this is used.
> >>
> >> Internally to the toolkit these calls are used on X11/GLX to translate
> >> target timestamps into target vblank counts for glXSwapBufferMscOML(),
> >> because OML_sync_control is based on vblank counts, not absolute system
> >> time, as you know, but ptb exposes an api where usercode specifies
> >> target timestamps, like in your protocol proposal. The query is also
> >> needed to work around some problems with the blocking nature of the X11
> >> protocol when one tries to swap multiple independent windows with
> >> different rates and uses glXWaitForSbcOML to wait for swap completion.
> >> E.g., what doesn't work on X11 is using different x-display connections
> >> - one per windows - and create GLX contexts which share resources across
> >> those connections, so if you need multi-window operation you have to
> >> create all GLX contexts on the same x-display connection and run all glx
> >> calls over that connection. If you run multiple independent animations
> >> in different windows you have to avoid blocking that x-connection, so i
> >> use glXGetSyncValuesOML to poll current msc and ust to find out when it
> >> is safe to do a blocking call.
> >>
> >> I hope that the way Waylands protocol works will make some of these
> >> hacks unneccessary, but user script code can call glXGetSyncValues at
> >> any time.
> >
> > Wayland won't allow you to use different connections even harder,
> > because there simply are no sharable references to protocol objects. But
> > OTOH, Wayland only provides you a low-level direct async interface to
> > the protocol as a library, so you will be doing all the blocking in your
> > app or GUI-toolkit.
> >
> 
> Yes. If i understood correctly, with Wayland, rendering is separated 
> from the presentation, all rendering and buffer management client-side, 
> only buffer presentation server-side. So i hope i'll be able to untangle 
> everything rendering related (like hope many OpenGL contexts to have and 
> what resources they should or should not share) from everything 
> presentation related. If the interface is async and has thread-safety in 
> mind, that should hopefully help a lot on the presentation side.

Correct. However, you may hit some inconvenient specifications in EGL
functionality, which you may then need some EGL extensions to work
around. For instance, EGL still completely hides all buffer management
from the application.

> The X11 protocol and at least XLib is not async and not thread-safe by 
> default, and at least under DRI2 the x-server controlled and owned the 
> back- and front-buffers, so you have lots of roundtrips and blocking 
> behavior on the single connection to make sure no backbuffer is touched 
> too early (while a swap is still pending), and many hacks to get around 
> that, and some race conditions in DRI2 around drawable invalidation.
> 
> So i'd be really surprised if Wayland wouldn't be an improvement for me 
> for multi-threaded or multi-window operations.

Wayland most likely is a huge improvement, but then again, you rarely
use Wayland directly in applications. You are still on the mercy of the
GUI-toolkits you use, and libraries like EGL, unless you take the
plunge into raw Wayland/DRM/GBM.

In Wayland, there is no equivalent of Xlib. Either you work with the
protocol directly (via libwayland-client which is only a very thin
wrapper to the wire format, really), or you use a real toolkit. I guess
libxcb would be corresponding to libwayland-client.

> > Sounds like we will need the "subscribe to streaming vblank events
> > interface" then. The recommended usage pattern would be to subscribe
> > only when needed, and unsubscribe ASAP.
> >
> > There is one detail, though. Presentation timestamp is defined as "turns
> > to light", not vblank. If the compositor knows about monitor latency, it
> > will add this time to the presentation timestamp. To keep things
> > consistent, we'd need to define it as a stream of turned-to-light
> > events.
> >
> 
> Yes, makes sense. The drm/kms timestamps are defined as "first pixel of 
> frame leaves the graphics cards output connector" aka start of active 
> scanout. In the time of CRT monitors, that was ~ "turns to light". In my 
> field CRT monitors are still very popular and actively hunted for 
> because of that very well defined timing behaviour. Or very expensive 
> displays which have custom made panel or dlp controllers with well 
> defined timing specifically for research use.

Ah, I didn't know about that the DRM vblank timestamps were defined
that way, very cool.

I'm happy that you do not see the "turns to light" definition as
problematic. It was one of the open questions, whether to use "turns to
light" or the OML_sync_control "the first pixel going out of the gfx
card connector".

> If the compositor knew the precise monitor latency it could add that as 
> a constant offset to those timestamps. Do you know of reliable ways to 
> get this info from any common commercial display equipment? Apple's OSX 
> has API in their CoreVideo framework for getting that number, and i 
> implement it in the OSX backend of my toolkit, but i haven't ever seen 
> that function returning anything else than "undefined" from any display?

I believe the situation is like with any such information blob (EDID,
BIOS tables, ...): hardware manufacturers just scribble something that
usually works for the usual cases of Microsoft Windows, and otherwise
it's garbage or not set. So, I think in theory there was some spec that
allows to define it (with HDMI or EDID or something?), but to trust that
for scientific work? I would not.

...
> >> I can also always feed just single frames into your presentation queue
> >> and wait for present_feedback for those single frames, so i can be sure
> >> the "proper" frame was presented.
> >
> > Making that work so that you can actually hit every scanout cycle with
> > a new image is something the compositor should implement, yes, but the
> > protocol does not guarantee it. I suspect it would work in practise
> > though.
> >
> 
> It works well enough on the X-Server, so i'd expect it to work on 
> Wayland as well.

Yes, it is a reasonable expectation. We (I) just have hard time
deciding, whether the presentation feedback is the appropriate trigger
for posting the next frame, or should it be the frame callback, which
has slightly different semantics. This will also tie intimately to the
compositor's repaint cycle, what it does at which point of the frame
period.

> This link...
> 
> <https://github.com/Psychtoolbox-3/Psychtoolbox-3/blob/master/Psychtoolbox/PsychDocumentation/ECVP2010Poster_VisualTimingPrecision.pdf?raw=true>
> 
> ...points to some results of tests i did a couple of years ago. Quite 
> outdated by now, but Linux + X11 came out as more reliable wrt. timing 
> precision than Windows and OSX, especially when realtime scheduling or 
> even a realtime kernel was used, with both proprietary graphics drivers 
> and the open-source drivers. I was very pleased with that :) - The other 

I'm curious, how did you arrange the realtime scheduling? Elevated both
the X server and your app to RT priority? Doesn't have have a huge risk
of hanging the whole machine, if there is a bug either code base? Not
to mention you probably had Octave or Matlab involved?

> thing i learned there is how much dynamic power management on the gpu 
> can bite you if the rendering/presentation behavior of your app doesn't 
> match the expectations of the algorithms used to control up/downclocking 
> on the gpu. That would be another topic related to a presentation 
> extension btw., having some sort of hints to the compositor about what 
> scheduling priority to choose, or if gpu power management should be 
> somehow affected by the timing needs of clients...

That's true, but I also think controlling power management is not in
scope of the presentation extension.

Power control is about hardware and system control, which smells a bit
like a privileged action. It needs to be solved separately, just like
e.g. clients cannot go and just change the video mode on an output in
Wayland.

...
> >>> It seems like we could have several flags describing the aspects of
> >>> the presentation feedback or presentation itself:
> >>>
> >>> 1. vsync'd or not
> >>> 2. hardware or software clock, i.e. DRM/KMS ioctl reported time vs.
> >>>      compositor calls clock_gettime() as soon as it is notified the screen
> >>>      update is done (so maybe kernel vs. userspace clock rather?)
> >>> 3. did screen update completion event exist, or was it faked by an
> >>>      arbitrary timer
> >>> 4. flip vs. copy?
> >>>
> >>
> >> Yes, that would be sufficient for my purpose. I can always get the
> >> OpenGL renderer/vendor string to do some basic checks for which kms
> >> driver is in use, and your flags would give me all needed dynamic
> >> information to judge reliability.
> >
> > Except that the renderer info won't help you, if the machine has more
> > than one GPU. It is quite possible to render on one GPU and scan out
> > on another.
> >
> 
> Yes, but i already use libpciaccess to enumerate gpu's on the bus, and 
> other more scary things, so i guess there will be a few more scary and 
> shady low level things to add ;-)

And then you say you don't want to go poking DRM/KMS and dmabuf
directly? O_o
;-)

...

> >
> >  From your feedback so far, I think you have only requested additional
> > features:
> > - ability to subscribe to a stream of vblank-like events
> > - do-not-skip flag for queued updates
> >
> 
> Yes, and the present_feedback flags.

Yes! I already forgot those.

> One more useful flag for me could be to know if the presented frame was 
> composited - together with some other content - or if my own buffer was 
> just flipped onscreen / no composition took place. More specifically - 

Does flipping your buffer directly into an overlay while still
compositing something else count as compositing? Or is it really only
just about "did *anything* else show on screen"?

> and maybe there's already some other event in the protocol for that - 
> i'd like to know if my presented surface was obscured by something else, 
> e.g., some kind of popup window like "system updates available", "you 
> have new mail", a user ALT+Tabbing away the window etc. On X11 i find 
> out indirectly about such unwanted visual disruptions because the 
> compositor would fall back to compositing instead of simple 
> page-flipping. On Wayland something similar would be cool if it doesn't 
> already exist.

I think that goes to the category "you really do not want to run on a
display server", sorry. :-)

On Wayland you don't really know if, how, or where your window might be
showing. Not on a normal desktop environment, anyway.

> Not sure if that still belongs into a present_feedback extension though.

No, it doesn't, IMHO.

Thanks,
pq