[RFC v2] Wayland presentation extension (video protocol)

Tue Feb 25 02:33:53 PST 2014

On Mon, 24 Feb 2014 23:25:18 +0100
Mario Kleiner <mario.kleiner.de at gmail.com> wrote:

> On 21/02/14 09:36, Pekka Paalanen wrote:
> >
...
> Atm. i have to verify on specific X-Server / ddx / Linux kernel 
> versions, at least for the most common setups i care about, because 
> there's always potential for bugs, so doing that for a few compositors 
> would be the same thing. The toolkit itself has a lot of paranoid 
> startup and runtime checks, so it can detect various bugs and alert the 
> user, which then hopefully alerts me. Users with high timing precision 
> demands also perform independent verification with external hardware 
> equipment, e.g., photo-diodes to attach to monitors etc., as another 
> layer of testing. That equipment is often just not practical for 
> production use, so they may test after each hardware/os/driver/toolkit 
> upgrade, but then trust the system between configuration changes.

Ok, sounds like you got it covered. :-)

...
> >
> > You could have your desktop session on a different VT than the
> > experiment program, and switch between. Or have multiple outputs, some
> > dedicated for the experiment, others for the desktop. The same for
> > input devices.
> >
> > Or if your infrastructure allows, have X11, Wayland, and direct DRM/KMS
> > backends choosable at runtime.
> >
> > But yes, it starts getting complicated.
> >
> 
> Always depends how easy that is for the user. E.g., if there's one 
> dual-head gpu installed in the machine, i don't know if it would be 
> easily possible to have one display output controlled by a compositor, 
> but the other output controlled directly by a gbm/drm/kms client. In the 
> past only one client could be a drm master on a drm device file.

In the glorious future it should be possible, but I believe there is
still lots of work to do before it's a reality. I think there is (was?)
work going on in splitting a card into several KMS nodes by heads in
the kernel. The primary use case of that is multi-seat, one machine
with several physical seats for users.

> Anyway, as soon as my average user has to start touching configuration 
> files, it gets complicated. Especially if different distros use 
> different ways of doing it.

Yeah, configuration is an issue.

<chop>

Thank you for explaining your use case and user base at length, it
really makes me understand where you come from. I think. :-)

> >>> On Thu, 20 Feb 2014 04:56:02 +0100
> >>> Mario Kleiner <mario.kleiner.de at gmail.com> wrote:
> >>>
> >>>> On 17/02/14 14:12, Pekka Paalanen wrote:
> >>>>> On Mon, 17 Feb 2014 01:25:07 +0100
> >>>>> Mario Kleiner <mario.kleiner.de at gmail.com> wrote:
> >>>>>
...
> >> Yes, makes sense. The drm/kms timestamps are defined as "first pixel of
> >> frame leaves the graphics cards output connector" aka start of active
> >> scanout. In the time of CRT monitors, that was ~ "turns to light". In my
> >> field CRT monitors are still very popular and actively hunted for
> >> because of that very well defined timing behaviour. Or very expensive
> >> displays which have custom made panel or dlp controllers with well
> >> defined timing specifically for research use.
> >
> > Ah, I didn't know about that the DRM vblank timestamps were defined
> > that way, very cool.
> >
> 
> The definition is that of OML_sync_control, so that spec could be 
> enabled in an as conformant way as possible. In practice only the 
> kms-drivers with the high precision timestamping (i915, radeon, nouveau) 
> will do precisely that. Other drivers just take a timestamp at vblank 
> irq time, so it's somewhere after vblank onset and could be off in case 
> of delayed irq execution, preemption etc.
> 
> > I'm happy that you do not see the "turns to light" definition as
> > problematic. It was one of the open questions, whether to use "turns to
> > light" or the OML_sync_control "the first pixel going out of the gfx
> > card connector".
> >
> 
> "turns to light" is what we ideally want, but the OML_sync_control 
> definition is the best approximation of that if the latency of the 
> display itself is unknown - and spot on in a world of CRT monitors.

Cool. It still leaves us the problem that is a monitor lies about its
latency, and a compositor uses that information, it'll be off, and
no-one would know unless they actually measured it with some special
equipment. Good thing that your professional users know to measure it.

> >> If the compositor knew the precise monitor latency it could add that as
> >> a constant offset to those timestamps. Do you know of reliable ways to
> >> get this info from any common commercial display equipment? Apple's OSX
> >> has API in their CoreVideo framework for getting that number, and i
> >> implement it in the OSX backend of my toolkit, but i haven't ever seen
> >> that function returning anything else than "undefined" from any display?
> >
> > I believe the situation is like with any such information blob (EDID,
> > BIOS tables, ...): hardware manufacturers just scribble something that
> > usually works for the usual cases of Microsoft Windows, and otherwise
> > it's garbage or not set. So, I think in theory there was some spec that
> > allows to define it (with HDMI or EDID or something?), but to trust that
> > for scientific work? I would not.
> >
> 
> At least the good scientists never trust ;-). They measure the actual 
> latency between software timestamps and display, once their setup is 
> ready for production use, e.g., with some attached photo-diodes. Then 
> they use the calculated offset to correct the software reported 
> timestamps to get to the "turns to light" they actually want.
> 
> Of course it's better to use a zero offset, so your timestamps == 
> OML_sync_control timestamps, and thereby avoid confusion on the users 
> side if there is doubt about the information reported by EDID etc. A 
> Wayland implementation would probably need some consistency checks or 
> quirks settings or blacklist to make sure no really bogus results get 
> added to the timestamps.

That may be an issue, yes. We'll see what the convention will be. It's
possible that because the monitor latency is unknown in most cases,
practically everyone will just make it zero always.

...
> >> This link...
> >>
> >> <https://github.com/Psychtoolbox-3/Psychtoolbox-3/blob/master/Psychtoolbox/PsychDocumentation/ECVP2010Poster_VisualTimingPrecision.pdf?raw=true>
> >>
> >> ...points to some results of tests i did a couple of years ago. Quite
> >> outdated by now, but Linux + X11 came out as more reliable wrt. timing
> >> precision than Windows and OSX, especially when realtime scheduling or
> >> even a realtime kernel was used, with both proprietary graphics drivers
> >> and the open-source drivers. I was very pleased with that :) - The other
> >
> > I'm curious, how did you arrange the realtime scheduling? Elevated both
> > the X server and your app to RT priority? Doesn't have have a huge risk
> > of hanging the whole machine, if there is a bug either code base? Not
> > to mention you probably had Octave or Matlab involved?
> >
> 
> My app at RT priority, in some tests a kernel with realtime patches. 
> That linked pdf has separate columns for RT vs. non-RT scheduling. 
> Elevating the X-Server didn't make much difference, switching off 
> dynamic gpu reclocking made a difference for some tasks. All bufferswaps 
> where page flipped.

Interesting that the X-server realtimeness didn't make a difference.

> There are two cases. When testing with the proprietary NVidia display 
> driver, performance was very good. As far as i can tell, their client 
> direct rendering implementation directly calls into kernel ioctl()'s to 
> trigger page flips, so it avoids roundtrips to the X-Server or any 
> delays incurred by the protocol or server.

I would be a little surprised by flipping from the client directly, but
then again... in the old days, my experience about the nvidia drivers
was that they happily waste CPU on bysyloop waits to get a little more
performance by default.

> >> thing i learned there is how much dynamic power management on the gpu
> >> can bite you if the rendering/presentation behavior of your app doesn't
> >> match the expectations of the algorithms used to control up/downclocking
> >> on the gpu. That would be another topic related to a presentation
> >> extension btw., having some sort of hints to the compositor about what
> >> scheduling priority to choose, or if gpu power management should be
> >> somehow affected by the timing needs of clients...
> >
> > That's true, but I also think controlling power management is not in
> > scope of the presentation extension.
> >
> > Power control is about hardware and system control, which smells a bit
> > like a privileged action. It needs to be solved separately, just like
> > e.g. clients cannot go and just change the video mode on an output in
> > Wayland.
> >
> 
> Yes, but it's somewhat related. A way to specify required quality of 
> service for a graphics client. You'd probably need some way of hinting 
> the system what your timing requirements are. I think there is some QOS 
> infrastructure in the kernel already for controlling cpu governors, 
> sleep states or suspend states and how deep they should go to guarantee 
> responsiveness etc. - or at least some work was done on that. One would 
> need to extend that to other actors like the gpu, to hint the gpu 
> dynamic power management how much it should prioritize 
> performance/latency over power savings. But it is a a complex field.

Right. I'd leave that for a later time. We can add new things to
existing extensions, too.

...
> >>>   From your feedback so far, I think you have only requested additional
> >>> features:
> >>> - ability to subscribe to a stream of vblank-like events
> >>> - do-not-skip flag for queued updates
> >>>
> >>
> >> Yes, and the present_feedback flags.
> >
> > Yes! I already forgot those.
> >
> >> One more useful flag for me could be to know if the presented frame was
> >> composited - together with some other content - or if my own buffer was
> >> just flipped onscreen / no composition took place. More specifically -
> >
> > Does flipping your buffer directly into an overlay while still
> > compositing something else count as compositing? Or is it really only
> > just about "did *anything* else show on screen"?
> >
> 
> "Displayed in a overlay" would be worth another flag, unless the update 
> of the overlay is guaranteed to be atomic with the page flip of the main 
> plane, so the kms timestamp corresponds to both framebuffer + overlay 
> update. kms pageflip timestamps only correspond to the main scanout 
> buffer, so overlays by themselves would be a potential timestamping 
> problem if their update is not synchronized.

I've understood that before Weston can actually really use hw planes
with DRM, it will need the atomic pageflip feature completed in the
kernel. I would assume that implies the framebuffer and overlay updates
are sync'd.

That might be another flag to add, yes; "this timestamp is less precise
than usual" maybe? But I'm not sure it makes sense, we would need to
define what it means and when to use it. What if we just used the "this
timestamp is manufactured by the compositor" flag instead? The one I
described as the timestamp coming from the kernel driver vs. compositor
reading a clock.

> The main purpose of the flags for me would be to allow to find out if 
> the present_feedback and its timestamp is really trustworthy, which 
> essentially means kms page-flipped on drm/kms.
> 
> That "did anything else show on the screen" is a bonus feature for me, 
> but if an overlay was active in addition to the main scanout buffer, 
> that would be one indication that something else was on the screen.

Right.

> >> and maybe there's already some other event in the protocol for that -
> >> i'd like to know if my presented surface was obscured by something else,
> >> e.g., some kind of popup window like "system updates available", "you
> >> have new mail", a user ALT+Tabbing away the window etc. On X11 i find
> >> out indirectly about such unwanted visual disruptions because the
> >> compositor would fall back to compositing instead of simple
> >> page-flipping. On Wayland something similar would be cool if it doesn't
> >> already exist.
> >
> > I think that goes to the category "you really do not want to run on a
> > display server", sorry. :-)
> >
> > On Wayland you don't really know if, how, or where your window might be
> > showing. Not on a normal desktop environment, anyway.
> >
> 
> I skimmed the current docs and I'd hope that at least with the Wayland 
> shell extension bits there would be a way to position windows or at 
> least find out where they are or if they're showing? There seems to be a 
> bit of api to that? And there seems to be some fullscreen api which 
> looks as if it would mostly do what i need?

You can know on which outputs your window is on, but no position more
accurate than that. The fullscreen API does indeed allow a client to
specify, on which output it should be fullscreen on. I'm not sure where
xdg_shell is on other window parameters. xdg_shell will replace
wl_shell eventually.

> At the moment i'm just not familiar enough with Wayland to judge how 
> well it would work for my purposes or what kind of workarounds i'd need 
> to implement to make it useable. I'll need to play around with it quite 
> a bit...
> 
> But anyway we went quite far off-topic for this thread ;)

Partly, yeah. :-)

Thanks,
pq