[RFC v2] Wayland presentation extension (video protocol)

Mon Feb 24 14:25:18 PST 2014

On 21/02/14 09:36, Pekka Paalanen wrote:
> On Fri, 21 Feb 2014 06:40:02 +0100
> Mario Kleiner <mario.kleiner.de at gmail.com> wrote:
>
>> On 20/02/14 12:07, Pekka Paalanen wrote:
>>> Hi Mario,
>>>
>>
>> Ok, now i am magically subscribed. Thanks to the moderator!
>
> Cool, I can start trimming out parts of the email. :-)
>
>>> I have replies to your comments below, but while reading what you said,
>>> I started wondering whether Wayland would be good for you after all.
>>>
>>> It seems that your timing sensitive experiment programs and you and
>>> your developers use a great deal of effort into
>>> - detecting the hardware and drivers,
>>> - determining how the display server works, so that you can
>>> - try to make it do exactly what you want, and
>>> - detect if it still does not do exactly like you want it and bail,
>>>     while also
>>> - trying to make sure you get the right timing feedback from the kernel
>>>     unmangled.
>>>
>>
>> Yes. It's "trust buf verify". If i know that the api / protocol is well
>> defined and suitable for my purpose and have verified that at least the
>> reference compositor implements the protocol correctly then i can at
>> least hope that all other compositors are also implemented correctly, so
>> stuff should work as expected. And i can verify that at least some
>> subset of compositors really works, and try to submit bug reports or
>> patches if they don't.
>
> I don't think we can make the Wayland protocol definition strict enough
> that you could just rely on other compositors implementing it the same
> way as Weston. We don't really want to restrict the implementations too
> much with generic protocol interfaces. Therefore I think you will need
> to test and validate not only every compositor, but possibly also their
> different releases.
>

Depends what you mean with "strict enough". Well defined is good enough. 
E.g., the level in your presentation extension RFC is good enough 
because it defines how the compositor should treat the target 
presentation timestamps precisely enough so i as a client implementer 
know how to specify the timestamps to get well defined behavior on a 
compositor that implements the protocol correctly on a system that is 
configured properly and not overloaded. If a compositor doesn't conform 
i can always try to bug the developers or submit patches myself to fix 
it. If otoh the protocol would be too vague, any behavior of the 
compositor would be consistent with the spec and i'd not even have a 
point submitting a bug report, or i wouldn't know in the first place how 
to code my client.

Or if your protocol specifies timestamps with nanosecond precision and 
recommends that they should have an accuracy of <= 1 msec and define the 
moment when pixels "turn to light", that's fine. If it wouldn't define 
what the timestamps actually mean or not recommend any good minimum 
precision, that would be troublesome.

Atm. i have to verify on specific X-Server / ddx / Linux kernel 
versions, at least for the most common setups i care about, because 
there's always potential for bugs, so doing that for a few compositors 
would be the same thing. The toolkit itself has a lot of paranoid 
startup and runtime checks, so it can detect various bugs and alert the 
user, which then hopefully alerts me. Users with high timing precision 
demands also perform independent verification with external hardware 
equipment, e.g., photo-diodes to attach to monitors etc., as another 
layer of testing. That equipment is often just not practical for 
production use, so they may test after each hardware/os/driver/toolkit 
upgrade, but then trust the system between configuration changes.

> Adding something special and optional for compositors to implement,
> with very strict implementation requirements would be possible, but
> with the caveat of not everyone implementing it.

Yes, that would be a problem.

My hope is that as different compositors get implemented and the 
mandatory protocol is well defined and reasonable in its requests for 
precision etc., people will implement it that way. I'd also hope that if 
the reference compositor has a well working and accurate implementation 
of such stuff then other compositors would mostly stay close to the 
implementation of the reference compositor if feasible, if only to save 
the developers of those compositors some time and headaches and 
maintenance overhead.

>>> Sounds like the display server is a huge source of problems to you, but
>>> I am not quite sure how running on top a display server benefits you.
>>> Your experiment programs want to be in precise control, get accurate
>>> timings, and they are always fullscreen. Your users / test subjects
>>> never switch away from the program while it's running, you don't need
>>> windowing or multi-tasking, AFAIU, nor any of the application
>>> interoperability features that are the primary features of a display
>>> server.
>>>
>>
>> They are fullscreen and timing sensitive in probably 95% of all typical
>> application cases during actual "production use" while experiments are
>> run. But some applications need the toolkit to present in regular
>> windows and GUI thingys, a few even need compositing to combine my
>> windows with windows of other apps. Some setups run multi-display, where
>> some displays are used for fullscreen stimulus presentation to the
>> tested person, but another display may be used for control/feedback or
>> during debugging by the experimenter, in which case the regular desktop
>> GUI and UI of the scripting environment is needed on that display. One
>> popular case during debugging is having a half-transparent fullscreen
>> window for stimulus presentation, but behind that window the whole
>> regular GUI with the code editor and debugger in the background, so one
>> can set breakpoints etc. - The window is made transparent for mouse and
>> keyboard input, so users can interact with the editor.
>>
>> So in most cases i need a display server running, because i sometimes
>> need compositing and i often need a fully functional GUI during at the
>> at least 50% of the work time where users are debugging and testing
>> their code and also don't want to be separated from their e-mail clients
>> and web browsers etc. during that time.
>
> You could have your desktop session on a different VT than the
> experiment program, and switch between. Or have multiple outputs, some
> dedicated for the experiment, others for the desktop. The same for
> input devices.
>
> Or if your infrastructure allows, have X11, Wayland, and direct DRM/KMS
> backends choosable at runtime.
>
> But yes, it starts getting complicated.
>

Always depends how easy that is for the user. E.g., if there's one 
dual-head gpu installed in the machine, i don't know if it would be 
easily possible to have one display output controlled by a compositor, 
but the other output controlled directly by a gbm/drm/kms client. In the 
past only one client could be a drm master on a drm device file.

Anyway, as soon as my average user has to start touching configuration 
files, it gets complicated. Especially if different distros use 
different ways of doing it.

>>> Why not take the display server completely out of the equation?
>>>
>>> I understand that some years ago, it would probably not have been
>>> feasible and X11 was the de facto interface to do any graphics.
>>>
>>> However, it seems you are already married to DRM/KMS so that you get
>>> accurate timing feedback, so why not port your experiment programs
>>> (the framework) directly on top of DRM/KMS instead of Wayland?
>>>
>>
>> Yes and no. DRM/KMS will be the most often used one and is the best bet
>> if i need timing control and it's the one i'm most familiar with. I also
>> want to keep the option of running on other backends if timing is not of
>> much importance, or if it can be improved on them, should the need arise.
>>
>>> With Mesa EGL and GBM, you can still use hardware accelerated openGL if
>>> you want to, but you will also be in explicit control of when you
>>> push that rendered buffer into KMS for display. Software rendering by
>>> direct pixel poking is also possible and at the end you just push that
>>> buffer to KMS as usual too. You do not need any graphics card specific
>>> code, it is all abstracted in the public APIs offered by Mesa and
>>> libdrm, e.g. GBM. The new libinput should make hooking into input
>>> devices much less painful, etc. All this thanks to Wayland, because on
>>> Wayland, there is no single "the server" like the X.org X server is.
>>> There will be lots of servers and each one needs the same
>>> infrastructure you would need to run without a display server.
>>>
>>> No display server obfuscating your view to the hardware, no
>>> compositing manager fiddling with your presentation, and most likely no
>>> random programs hogging the GPU at random times. Would the trade-off
>>> not be worth it?
>>>
>>
>> I thought about EGL/GBM etc. as a last resort for especially demanding
>> cases, timing-wise. But given that the good old X-Server was good enough
>> for almost everything so far, i'd expect Wayland to perform as good as
>> or better timing-wise. If that turns out to be true, it would be good
>> enough for hopefully almost all use cases, with all the benefits of
>> compositing and GUI support when needed, and not having to reimplement
>> my own display server. E.g., i'm also using GStreamer as media backend
>> (still 0.10 though) and while there is Wayland integration, i doubt
>> there will be Psychtoolbox integration. Or things like Optimus-style
>> hybrid graphics laptops with one gpu rendering, the other gpu
>> displaying. I assume Wayland will tackle this stuff at some point,
>> whereas i'm not too keen to potentially learn and tackle all the ins and
>> outs of rendernodes or dmabuf juggling.
>
> I never meant you would need to implement a display server, just each
> of the sensitive programs presenting directly to the DRM/KMS, one at a
> time. The only IPC you might need is to ad hoc communicate with a
> controlling application that runs on a normal display server. :-)
>

That would be difficult in my case.

> Of course, making Wayland at least as "good" as the X.org X server for
> X11 is is preferrable, but at the same time we specifically drop some
> features like clients' ability to arbitrarily hold the server as a
> hostage with locks and grabs. With presentation and timings, I am trying
> to avoid "urgent protocol messages" that refer to something happening
> "right now", or at least to not rely on them too much.
>
> Gstreamer probably won't have Psychtoolbox integration indeed, unless
> you make it happen yourself of pay someone to do it. But I could see
> something like a GBM integration, which would benefit more than just
> Psychtoolbox. Or EGL! EGL is a pretty common interface, and producing
> video frames as EGLImages is something I think Gstreamer might already
> do, at least for some pipelines.
>
> Optimus support is much more about the kernel drivers, DRM
> infrastructure, and the user space drivers i.e. Mesa, than any display
> server or window system protocol. Again, you could just take advantage
> of the infrastructure built to enable Wayland (and X11!), as that is
> window system agnostic.
>
> Optimus is not for Wayland to tackle per se, Wayland only needs to be a
> simple messenger. After all, Wayland is only a protocol, not an
> implementation of anything.
>

I know it's only the protocol, but when i use "Wayland" i also mean 
typical implementations. These were just examples. Generally it's just 
about me trying to not reinvent the wheel too often just because my code 
is too low-level to tap into existing higher level implementations. Also 
i'm only one person, so there's no code review, nobody to share the load 
of maintaining the code, and no mailing-list of experienced developers i 
can ask for help if i get stuck deep down in my own implementation. So i 
try to use common implementations with a big developer/user community 
when possible without compromising the accuracy or robustness of the 
toolkit and rather contribute patches to fix or improve those projects 
as neccessary.

>>> Of course your GUI tools and apps could continue using a display server
>>> and would probably like to be ported to be Wayland compliant, I'm just
>>> suggesting this for the sensitive experiment programs. Would this be
>>> possible for your infrastructure?
>>>
>>
>> Not impossible when absolutely needed, just inconvenient for the user +
>> yet another display backend to maintain and test for me. Psychtoolbox is
>> a set of extensions for both GNU/Octave and Mathworks Matlab - both use
>> the same scripting language, so users can choose which to use and have
>> portable code between them. Matlab uses a GUI based on Java/Awt/Swing on
>> top of X11, whereas Octave just gained a QT-based GUI beginning this
>> year. You can run both apps in a terminal or console, but most of my
>> users are mostly psychologists/neuro-biologists/physicians, and most of
>> them only have basic programming skills and are somewhat frightened by
>> command line environments. Most would touch a non-GUI environment only
>> in moments of highest despair, let alone learn how to switch between a
>> GUI environment and a console.
>
> To be quite honest, I am surprised you have managed to get the toolbox
> working so well in such an environment. It must have been really hard,
> given your timing requirements. I have worked on Matlab myself a few
> years, but it was always "if you need to be fast and predictable, use
> something else". (Yeah, I know you can write MEX etc. stuff.)
>

It's not that bad. Octave/Matlab as interpreted language is slow 
compared to C code, and i can't fully control the process in which the 
toolkit lives, but all my timing and performance sensitive code is in 
mex files as compiled C code. In the mex files i can use multi-threading 
and realtime scheduling and other optimizations and OS services to get 
what i want. Then the rather slow usercode, written by people with 
usually absolutely no background in realtime programming, can submit 
timing sensitive work to the mex files. It's essentially the same as 
what you do with your presentation protocol. Don't rely on the users 
script to be precise on time but queue up work ahead of time, with 
target timestamps attached, and leave it to the C code to get it on the 
screen or out of the speakers or i/o ports at the proper time 
asynchronously. Don't poll for input but leave it to the toolkit to 
timestamp input properly and just dequeue.

> I think I understand. The trade-off would not be a net gain for you at
> the moment: you prefer ease of installation and use over the more
> rarely needed guaranteed performance.
>

I need to eat the cake and keep it ;) - The demands and skills of my 
users vary very much. Maybe 10% of them require sub-millisecond 
precision and essentially no dropped frames ever, and they would touch a 
command line or edit config files or type sudo commands to shut down the 
display server or change runlevels or install patched realtime kernels 
or whatever if that's the only way to get stuff working. Maybe another 
50% need millisecond timing and not too many dropped frames and would 
probably rather sacrifice precision than convenience. And some 
percentage doesn't care for their specific use case and would never 
leave a regular desktop environment. Those who switch to Linux as host 
os, and away from Windows or OSX, are usually the ones who are not 
satisfied with the performance or flexibility of those platforms, so 
they arrive with the expectation of higher performance or flexibility on 
Linux, but with the experience and expectation of convenience and 
functionality from those systems.

It's all a trade-off. If Wayland would prove unworkable i'd implement a 
gbm/drm/kms backend for the top 20% of demanding use cases, but i'd 
rather not if i don't have to, because the other 80% would still need 
some desktop GUI environment, so a Wayland backend, or use of XWayland 
to share the X11 backend which also needs to be maintained for many more 
years. And then there may be hardware we want to use which doesn't have 
drm/kms, but a working Wayland implementation.
And so far X-Windows was good enough.

>>>
>>> On Thu, 20 Feb 2014 04:56:02 +0100
>>> Mario Kleiner <mario.kleiner.de at gmail.com> wrote:
>>>
>>>> On 17/02/14 14:12, Pekka Paalanen wrote:
>>>>> On Mon, 17 Feb 2014 01:25:07 +0100
>>>>> Mario Kleiner <mario.kleiner.de at gmail.com> wrote:
>>>>>
> ...
>>>>>> 1. Wrt. an additional "preroll_feedback" request
>>>>>> <http://lists.freedesktop.org/archives/wayland-devel/2014-January/013014.html>,
>>>>>> essentially the equivalent of glXGetSyncValuesOML(), that would be very
>>>>>> valuable to us.
>>>>>>
>>>> ...
>>>>>
>>>>> Indeed, the "preroll_feedback" request was modeled to match
>>>>> glXGetSyncValuesOML.
>>>>>
>>>>> Do you need to be able to call GetSyncValues at any time and have it
>>>>> return ASAP? Do you call it continuously, and even between frames?
>>>>>
>>>>
>>>> Yes, at any time, even between frames, with a return asap. This is
>>>> driven by the specific needs of user code, e.g., to poll for a vblank,
>>>> or to establish a baseline of current (msc, ust) for timing stuff
>>>> relative. Psychtoolbox is an extension to a scripting language, so
>>>> usercode often decides how this is used.
>>>>
>>>> Internally to the toolkit these calls are used on X11/GLX to translate
>>>> target timestamps into target vblank counts for glXSwapBufferMscOML(),
>>>> because OML_sync_control is based on vblank counts, not absolute system
>>>> time, as you know, but ptb exposes an api where usercode specifies
>>>> target timestamps, like in your protocol proposal. The query is also
>>>> needed to work around some problems with the blocking nature of the X11
>>>> protocol when one tries to swap multiple independent windows with
>>>> different rates and uses glXWaitForSbcOML to wait for swap completion.
>>>> E.g., what doesn't work on X11 is using different x-display connections
>>>> - one per windows - and create GLX contexts which share resources across
>>>> those connections, so if you need multi-window operation you have to
>>>> create all GLX contexts on the same x-display connection and run all glx
>>>> calls over that connection. If you run multiple independent animations
>>>> in different windows you have to avoid blocking that x-connection, so i
>>>> use glXGetSyncValuesOML to poll current msc and ust to find out when it
>>>> is safe to do a blocking call.
>>>>
>>>> I hope that the way Waylands protocol works will make some of these
>>>> hacks unneccessary, but user script code can call glXGetSyncValues at
>>>> any time.
>>>
>>> Wayland won't allow you to use different connections even harder,
>>> because there simply are no sharable references to protocol objects. But
>>> OTOH, Wayland only provides you a low-level direct async interface to
>>> the protocol as a library, so you will be doing all the blocking in your
>>> app or GUI-toolkit.
>>>
>>
>> Yes. If i understood correctly, with Wayland, rendering is separated
>> from the presentation, all rendering and buffer management client-side,
>> only buffer presentation server-side. So i hope i'll be able to untangle
>> everything rendering related (like hope many OpenGL contexts to have and
>> what resources they should or should not share) from everything
>> presentation related. If the interface is async and has thread-safety in
>> mind, that should hopefully help a lot on the presentation side.
>
> Correct. However, you may hit some inconvenient specifications in EGL
> functionality, which you may then need some EGL extensions to work
> around. For instance, EGL still completely hides all buffer management
> from the application.
>

I can imagine. I have some X11/EGL backend in my code, mostly just for 
testing a bit on embedded graphics, and EGL is already comparatively 
limited wrt. GLX.

>> The X11 protocol and at least XLib is not async and not thread-safe by
>> default, and at least under DRI2 the x-server controlled and owned the
>> back- and front-buffers, so you have lots of roundtrips and blocking
>> behavior on the single connection to make sure no backbuffer is touched
>> too early (while a swap is still pending), and many hacks to get around
>> that, and some race conditions in DRI2 around drawable invalidation.
>>
>> So i'd be really surprised if Wayland wouldn't be an improvement for me
>> for multi-threaded or multi-window operations.
>
> Wayland most likely is a huge improvement, but then again, you rarely
> use Wayland directly in applications. You are still on the mercy of the
> GUI-toolkits you use, and libraries like EGL, unless you take the
> plunge into raw Wayland/DRM/GBM.
>
> In Wayland, there is no equivalent of Xlib. Either you work with the
> protocol directly (via libwayland-client which is only a very thin
> wrapper to the wire format, really), or you use a real toolkit. I guess
> libxcb would be corresponding to libwayland-client.
>
>>> Sounds like we will need the "subscribe to streaming vblank events
>>> interface" then. The recommended usage pattern would be to subscribe
>>> only when needed, and unsubscribe ASAP.
>>>
>>> There is one detail, though. Presentation timestamp is defined as "turns
>>> to light", not vblank. If the compositor knows about monitor latency, it
>>> will add this time to the presentation timestamp. To keep things
>>> consistent, we'd need to define it as a stream of turned-to-light
>>> events.
>>>
>>
>> Yes, makes sense. The drm/kms timestamps are defined as "first pixel of
>> frame leaves the graphics cards output connector" aka start of active
>> scanout. In the time of CRT monitors, that was ~ "turns to light". In my
>> field CRT monitors are still very popular and actively hunted for
>> because of that very well defined timing behaviour. Or very expensive
>> displays which have custom made panel or dlp controllers with well
>> defined timing specifically for research use.
>
> Ah, I didn't know about that the DRM vblank timestamps were defined
> that way, very cool.
>

The definition is that of OML_sync_control, so that spec could be 
enabled in an as conformant way as possible. In practice only the 
kms-drivers with the high precision timestamping (i915, radeon, nouveau) 
will do precisely that. Other drivers just take a timestamp at vblank 
irq time, so it's somewhere after vblank onset and could be off in case 
of delayed irq execution, preemption etc.

> I'm happy that you do not see the "turns to light" definition as
> problematic. It was one of the open questions, whether to use "turns to
> light" or the OML_sync_control "the first pixel going out of the gfx
> card connector".
>

"turns to light" is what we ideally want, but the OML_sync_control 
definition is the best approximation of that if the latency of the 
display itself is unknown - and spot on in a world of CRT monitors.

>> If the compositor knew the precise monitor latency it could add that as
>> a constant offset to those timestamps. Do you know of reliable ways to
>> get this info from any common commercial display equipment? Apple's OSX
>> has API in their CoreVideo framework for getting that number, and i
>> implement it in the OSX backend of my toolkit, but i haven't ever seen
>> that function returning anything else than "undefined" from any display?
>
> I believe the situation is like with any such information blob (EDID,
> BIOS tables, ...): hardware manufacturers just scribble something that
> usually works for the usual cases of Microsoft Windows, and otherwise
> it's garbage or not set. So, I think in theory there was some spec that
> allows to define it (with HDMI or EDID or something?), but to trust that
> for scientific work? I would not.
>

At least the good scientists never trust ;-). They measure the actual 
latency between software timestamps and display, once their setup is 
ready for production use, e.g., with some attached photo-diodes. Then 
they use the calculated offset to correct the software reported 
timestamps to get to the "turns to light" they actually want.

Of course it's better to use a zero offset, so your timestamps == 
OML_sync_control timestamps, and thereby avoid confusion on the users 
side if there is doubt about the information reported by EDID etc. A 
Wayland implementation would probably need some consistency checks or 
quirks settings or blacklist to make sure no really bogus results get 
added to the timestamps.

> ...
>>>> I can also always feed just single frames into your presentation queue
>>>> and wait for present_feedback for those single frames, so i can be sure
>>>> the "proper" frame was presented.
>>>
>>> Making that work so that you can actually hit every scanout cycle with
>>> a new image is something the compositor should implement, yes, but the
>>> protocol does not guarantee it. I suspect it would work in practise
>>> though.
>>>
>>
>> It works well enough on the X-Server, so i'd expect it to work on
>> Wayland as well.
>
> Yes, it is a reasonable expectation. We (I) just have hard time
> deciding, whether the presentation feedback is the appropriate trigger
> for posting the next frame, or should it be the frame callback, which
> has slightly different semantics. This will also tie intimately to the
> compositor's repaint cycle, what it does at which point of the frame
> period.
>
>> This link...
>>
>> <https://github.com/Psychtoolbox-3/Psychtoolbox-3/blob/master/Psychtoolbox/PsychDocumentation/ECVP2010Poster_VisualTimingPrecision.pdf?raw=true>
>>
>> ...points to some results of tests i did a couple of years ago. Quite
>> outdated by now, but Linux + X11 came out as more reliable wrt. timing
>> precision than Windows and OSX, especially when realtime scheduling or
>> even a realtime kernel was used, with both proprietary graphics drivers
>> and the open-source drivers. I was very pleased with that :) - The other
>
> I'm curious, how did you arrange the realtime scheduling? Elevated both
> the X server and your app to RT priority? Doesn't have have a huge risk
> of hanging the whole machine, if there is a bug either code base? Not
> to mention you probably had Octave or Matlab involved?
>

My app at RT priority, in some tests a kernel with realtime patches. 
That linked pdf has separate columns for RT vs. non-RT scheduling. 
Elevating the X-Server didn't make much difference, switching off 
dynamic gpu reclocking made a difference for some tasks. All bufferswaps 
where page flipped.

There are two cases. When testing with the proprietary NVidia display 
driver, performance was very good. As far as i can tell, their client 
direct rendering implementation directly calls into kernel ioctl()'s to 
trigger page flips, so it avoids roundtrips to the X-Server or any 
delays incurred by the protocol or server.

On the FOSS Mesa drivers with DRI2 + kms page-flipping the server/ddx 
queues a vblank event in the kernel one video refresh ahead of the 
target vblank for a swap. This way a vblank event is delivered by the 
kernel and causes the kms page flip ioctl() call to be triggered either 
immediately or almost one refresh duration ahead of the target vblank. 
In practice the server could be delayed by almost 16 msecs on a 60 Hz 
display before skipped frames would happen.

Also in practice during a running experiment session, and during these 
tests, one tries to calm down the system and be reasonably nice to it: 
Close down unneeded desktop apps, don't run file indexing or similar 
tasks, don't stress the parts of the desktop which are unrelated to the 
app. If i would do things like furiously wiggle windows around, maximize 
and minimize them, run all kinds of desktop effects and animations etc. 
i'd certainly see much worse timing and more skipped frames.

Wrt. RT priority, most machines are now multi-core, so it's not that 
easy to hang the whole machine anymore as with a single core. And having 
realtime watchdogs etc. also helps.

>> thing i learned there is how much dynamic power management on the gpu
>> can bite you if the rendering/presentation behavior of your app doesn't
>> match the expectations of the algorithms used to control up/downclocking
>> on the gpu. That would be another topic related to a presentation
>> extension btw., having some sort of hints to the compositor about what
>> scheduling priority to choose, or if gpu power management should be
>> somehow affected by the timing needs of clients...
>
> That's true, but I also think controlling power management is not in
> scope of the presentation extension.
>
> Power control is about hardware and system control, which smells a bit
> like a privileged action. It needs to be solved separately, just like
> e.g. clients cannot go and just change the video mode on an output in
> Wayland.
>

Yes, but it's somewhat related. A way to specify required quality of 
service for a graphics client. You'd probably need some way of hinting 
the system what your timing requirements are. I think there is some QOS 
infrastructure in the kernel already for controlling cpu governors, 
sleep states or suspend states and how deep they should go to guarantee 
responsiveness etc. - or at least some work was done on that. One would 
need to extend that to other actors like the gpu, to hint the gpu 
dynamic power management how much it should prioritize 
performance/latency over power savings. But it is a a complex field.

Microsoft tries with its MMCSS api since Vista to provide some 
multi-media clients with some sort of realtime scheduling with 
child-protection to the clients and their compositor if needed, but some 
of the api, e.g., gpu priority, is just stubs, not yet implemented as of 
Windows 8.1.

> ...
>>>>> It seems like we could have several flags describing the aspects of
>>>>> the presentation feedback or presentation itself:
>>>>>
>>>>> 1. vsync'd or not
>>>>> 2. hardware or software clock, i.e. DRM/KMS ioctl reported time vs.
>>>>>       compositor calls clock_gettime() as soon as it is notified the screen
>>>>>       update is done (so maybe kernel vs. userspace clock rather?)
>>>>> 3. did screen update completion event exist, or was it faked by an
>>>>>       arbitrary timer
>>>>> 4. flip vs. copy?
>>>>>
>>>>
>>>> Yes, that would be sufficient for my purpose. I can always get the
>>>> OpenGL renderer/vendor string to do some basic checks for which kms
>>>> driver is in use, and your flags would give me all needed dynamic
>>>> information to judge reliability.
>>>
>>> Except that the renderer info won't help you, if the machine has more
>>> than one GPU. It is quite possible to render on one GPU and scan out
>>> on another.
>>>
>>
>> Yes, but i already use libpciaccess to enumerate gpu's on the bus, and
>> other more scary things, so i guess there will be a few more scary and
>> shady low level things to add ;-)
>
> And then you say you don't want to go poking DRM/KMS and dmabuf
> directly? O_o
> ;-)
>

I realize the irony and contradictions in some of my statements ;-) - 
It's the result of trying to eat the cake and keep it.

> ...
>
>>>
>>>   From your feedback so far, I think you have only requested additional
>>> features:
>>> - ability to subscribe to a stream of vblank-like events
>>> - do-not-skip flag for queued updates
>>>
>>
>> Yes, and the present_feedback flags.
>
> Yes! I already forgot those.
>
>> One more useful flag for me could be to know if the presented frame was
>> composited - together with some other content - or if my own buffer was
>> just flipped onscreen / no composition took place. More specifically -
>
> Does flipping your buffer directly into an overlay while still
> compositing something else count as compositing? Or is it really only
> just about "did *anything* else show on screen"?
>

"Displayed in a overlay" would be worth another flag, unless the update 
of the overlay is guaranteed to be atomic with the page flip of the main 
plane, so the kms timestamp corresponds to both framebuffer + overlay 
update. kms pageflip timestamps only correspond to the main scanout 
buffer, so overlays by themselves would be a potential timestamping 
problem if their update is not synchronized.

The main purpose of the flags for me would be to allow to find out if 
the present_feedback and its timestamp is really trustworthy, which 
essentially means kms page-flipped on drm/kms.

That "did anything else show on the screen" is a bonus feature for me, 
but if an overlay was active in addition to the main scanout buffer, 
that would be one indication that something else was on the screen.

>> and maybe there's already some other event in the protocol for that -
>> i'd like to know if my presented surface was obscured by something else,
>> e.g., some kind of popup window like "system updates available", "you
>> have new mail", a user ALT+Tabbing away the window etc. On X11 i find
>> out indirectly about such unwanted visual disruptions because the
>> compositor would fall back to compositing instead of simple
>> page-flipping. On Wayland something similar would be cool if it doesn't
>> already exist.
>
> I think that goes to the category "you really do not want to run on a
> display server", sorry. :-)
>
> On Wayland you don't really know if, how, or where your window might be
> showing. Not on a normal desktop environment, anyway.
>

I skimmed the current docs and I'd hope that at least with the Wayland 
shell extension bits there would be a way to position windows or at 
least find out where they are or if they're showing? There seems to be a 
bit of api to that? And there seems to be some fullscreen api which 
looks as if it would mostly do what i need?

At the moment i'm just not familiar enough with Wayland to judge how 
well it would work for my purposes or what kind of workarounds i'd need 
to implement to make it useable. I'll need to play around with it quite 
a bit...

But anyway we went quite far off-topic for this thread ;)

thanks,
-mario