RFC for a render API to support adaptive sync and VRR

Wed Apr 11 23:30:30 UTC 2018

> From: Michel Dänzer [mailto:michel at daenzer.net]
> Sent: Wednesday, April 11, 2018 05:50
> On 2018-04-11 08:57 AM, Nicolai Hähnle wrote:
> > On 10.04.2018 23:45, Cyr, Aric wrote:
> >> How does it work fine today given that all kernel seems to know is
> >> 'current' or 'current+1' vsyncs.
> >> Presumably the applications somehow schedule all this just fine.
> >> If this works without variable refresh for 60Hz, will it not work for
> >> a fixed-rate "48Hz" monitor (assuming a 24Hz video)?
> >
> > You're right. I guess a better way to state the point is that it
> > *doesn't* really work today with fixed refresh, but if we're going to
> > introduce a new API, then why not do so in a way that can fix these
> > additional problems as well?
> 
> Exactly. With a fixed frame duration, we'll still have fundamentally the
> same issues as we currently do without variable refresh, not making use
> of the full potential of variable refresh.

I see.  Well then, that's makes this sort of orthogonal to the discussion.  
If you say that there are no media players on Linux today that can maintain audio/video sync with a 60Hz display, then that problem is much larger than the one we're trying to solve here.  
By the way, I don't believe that is a true statement :)

> > Say you have a multi-GPU system, and each GPU has multiple displays
> > attached, and a single application is driving them all. The application
> > queues flips for all displays with the same target_present_time_ns
> > attribute. Starting at some time T, the application simply asks for the
> > same present time T + i * 16666667 (or whatever) for frame i from all
> > displays.
[snip]
> > Why would that not work to sync up all displays almost perfectly?
> 
> Seconded.

It doesn't work that way unfortunately.  In theory, sounds great, but if you ask anyone who's worked with framelock/genlock, it is a complicated problem.  
Easiest explaination is that you need to be able to atomically program registers across multiple GPUs at the same time, this is not possible without hardware assist (see AMD's S400 module for example).
We have enough to discuss without this, so let's leave mGPU for another day since we can't solve it here anyways.

> > Okay, that's interesting. Does this mean that the display driver still
> > programs a refresh rate to some hardware register?

Yes, driver can, in some cases, update the minimum and maximum vertical total each flip.
In fixed rated example, you would set them equal to achieve your desired refresh rate.
We don't program refresh rate, just the vertical total min/max (which translates to affecting refresh rate, given constant pixel clock and horizontal total).
Thus, our refresh rate granularity is one line-time, on the order of microsec accuracy.

> > How about what I wrote in an earlier mail of having attributes:
> >
> > - target_present_time_ns
> > - hint_frame_time_ns (optional)
> >
> > ... and if a video player set both, the driver could still do the
> > optimizations you've explained?
> 
> FWIW, I don't think a property would be a good mechanism for the target
> presentation time.
> 
> At least with VDPAU, video players are already explicitly specifying the
> target presentation time, so no changes should be required at that
> level. Don't know about other video APIs.
> 
> The X11 Present extension protocol is also prepared for specifying the
> target presentation time already, the support for it just needs to be
> implemented.

I'm perfectly OK with presentation time-based *API*.  I get it from a user mode/app perspective, and that's fine.  We need that feedback and would like help defining that portions of the stack.
However, I think it doesn't make as much sense as a *DDI* because it doesn't correspond to any hardware real or logical (i.e. no one would implement it in HW this way) and the industry specs aren't defined that way.
You can have libdrm or some other usermode component translate your presentation time into a frame duration and schedule it.  What's the advantage of having this in kernel besides the fact we lose the intent of the application and could prevent features and optimizations.  When it gets to kernel, I think it is much more elegant for the flip structure to contain a simple duration that says "hey, show this frame on the screen for this long".  Then we don't need any clocks or timers just some simple math and program the hardware.

In short, 
 1) We can simplify media players' lives by helping them get really, really close to their content rate, so they wouldn't need any frame rate conversion.  
     They'll still need A/V syncing though, and variable refresh cannot solve this and thus is way out of scope of what we're proposing.  

 2) For gaming, don't even try to guess a frame duration, the driver/hardware will do a better job every time, just specify duration=0 and flip as fast as you can.

Regards,
  Aric

P.S. Thanks for the Croteam link.  Interesting, but basically nullified by variable refresh rate displays.  You won't have stuttering/microstuttering/juddering/tearing if your display's refresh rate matches the render/present rate of the game.  Maybe I should grab The Talos Principle to see how well it works with FreeSync display :)

--
ARIC CYR 
PMTS Software Engineer | SW – Display Technologies