RFC for a render API to support adaptive sync and VRR
Nicolai Hähnle
nicolai.haehnle at amd.com
Wed Apr 11 06:57:08 UTC 2018
On 10.04.2018 23:45, Cyr, Aric wrote:
>>>>>>>> For video games we have a similar situation where a frame is rendered
>>>>>>>> for a certain world time and in the ideal case we would actually
>>>>>>>> display the frame at this world time.
>>>>>>>
>>>>>>> That seems like it would be a poorly written game that flips like
>>>>>>> that, unless they are explicitly trying to throttle the framerate for
>>>>>>> some reason. When a game presents a completed frame, they’d like
>>>>>>> that to happen as soon as possible.
>>>>>>
>>>>>> What you're describing is what most games have been doing traditionally.
>>>>>> Croteam's research shows that this results in micro-stuttering, because
>>>>>> frames may be presented too early. To avoid that, they want to
>>>>>> explicitly time each presentation as described by Christian.
>>>>>
>>>>> Yes, I agree completely. However that's only truly relevant for fixed
>>>>> refreshed rate displays.
>>>>
>>>> No, it also affects variable refresh; possibly even more in some cases,
>>>> because the presentation time is less predictable.
>>>
>>> Yes, and that's why you don't want to do it when you have variable refresh. The hardware in the monitor and GPU will do it for you,
>> so why bother?
>>
>> I think Michel's point is that the monitor and GPU hardware *cannot*
>> really do this, because there's synchronization with audio to take into
>> account, which the GPU or monitor don't know about.
>
> How does it work fine today given that all kernel seems to know is 'current' or 'current+1' vsyncs.
> Presumably the applications somehow schedule all this just fine.
> If this works without variable refresh for 60Hz, will it not work for a fixed-rate "48Hz" monitor (assuming a 24Hz video)?
You're right. I guess a better way to state the point is that it
*doesn't* really work today with fixed refresh, but if we're going to
introduce a new API, then why not do so in a way that can fix these
additional problems as well?
>> Also, as I wrote separately, there's the case of synchronizing multiple
>> monitors.
>
> For multimonitor to work with VRR, they'll have to be timing and flip synchronized.
> This is impossible for an application to manage, it needs driver/HW control or you end up with one display flipping before the other and it looks terrible.
> And definitely forget about multiGPU without professional workstation-type support needed to sync the displays across adapters.
I'm not a display expert, but I find it hard to believe that it's that
difficult. Perhaps you can help us understand?
Say you have a multi-GPU system, and each GPU has multiple displays
attached, and a single application is driving them all. The application
queues flips for all displays with the same target_present_time_ns
attribute. Starting at some time T, the application simply asks for the
same present time T + i * 16666667 (or whatever) for frame i from all
displays.
Of course it's to be expected that some (or all) of the displays will
not be able to hit the target time on the first bunch of flips due to
hardware limitations, but as long as the range of supported frame times
is wide enough, I'd expect all of them to drift towards presenting at
the correct time eventually, even across multiple GPUs, with this simple
scheme.
Why would that not work to sync up all displays almost perfectly?
[snip]
>> Are there any real problems with exposing an absolute target present time?
>
> Realistically, how far into the future are you requesting a presentation time? Won't it almost always be something like current_time+1000/video_frame_rate?
> If so, why not just tell the driver to set 1000/video_frame_rate and have the GPU/monitor create nicely spaced VSYNCs for you that match the source content?
>
> In fact, you probably wouldn't even need to change your video player at all, other than having it pass the target_frame_duration_ns. You could consider this a 'hint' as you suggested, since it's cannot be guaranteed in cases your driver or HW doesn't support variable refresh. If the target_frame_duration_ns hint is supported/applied, then the video app should have nothing extra to do that it wouldn't already do for any arbitrary fixed-refresh rate display. If not supported (say the drm_atomic_check fails with -EINVAL or something), the video app falls can stop requesting a fixed target_frame_duration_ns.
>
> A fundamental problem I have with a target present time though is how to accommodate present times that are larger than one VSYNC time? If my monitor has a 40Hz-60Hz variable refresh, it's easy to translate "my content is 24Hz, repeat this next frame an integer multiple number of times so that it lands within the monitor range". Driver fixes display to an even 48Hz and everything good (no worse than a 30Hz clip on a traditional 60Hz display anyways). This frame-doubling is all hardware based and doesn't require any polling.
>
> Now if you change that to "show my content in at least X nanoseconds" it can work on all displays, but the intent of the app is gone and driver/GPU/display cannot optimize. For example, the HDMI VRR spec defines a "CinemaVRR" mode where target refresh rate error is accounted for based on 0.1% deviation from requested and the v_total lines are incremented/decremented to compensate. If we don't know the target rate, we will not be able to comply to this industry standard specification.
Okay, that's interesting. Does this mean that the display driver still
programs a refresh rate to some hardware register?
What if you want to initiate some CPU-controlled drift, i.e. you know
you're targeting 2*24Hz, but you'd like to shift all flip times to be X
ms later? Can you program hardware for that, and how does it work? Do
have you twiddle the refresh rate, or can the hardware do it natively?
How about what I wrote in an earlier mail of having attributes:
- target_present_time_ns
- hint_frame_time_ns (optional)
... and if a video player set both, the driver could still do the
optimizations you've explained?
> Also, how would you manage an absolute target present time in kernel? I guess app and driver need to use a common system clock or tick count, but when would you know to 'wake up' and execute the flip? If you wait for VSYNC then you'll always timeout out on v_total_max (i.e. minimum refresh rate), check your time and see "yup, need to present now" and then flip. Now your monitor just jumped from lowest refresh rate to something else which can cause other problems. If you use some timer, then you're burning needless power polling some counter and still wouldn't have the same accuracy you could achieve with a fixed duration.
For the clock, we just have to specify which one to take. I believe
CLOCK_MONOTONIC makes the most sense for this kind of thing.
For your other questions, I'm afraid I just don't know enough about
modern display hardware to give a really good answer, but with my naive
understanding I would imagine something like the following:
1. When the atomic commit happens, the driver twiddles with the display
timings to get the start of scanout for the next frame as close as
possible to the specified target present time (I assume this is what
v_total_max is about?)
2. The kernel then schedules a timer for the time when the display
hardware is finished scanning out the previous frame and starts vblank.
3. In the handler for that timer, the kernel checks whether any fence
associated to the new frame's surface has signaled. If yes, it changes
the display hardware's framebuffer pointer to the new frame. Otherwise,
it atomically registers for the handler to be run again when the fence
is signaled.
3b. The handler should check if vblank has already ended (either due to
extreme CPU overload or because the fence was signaled too late).
Actually, that last point makes me wonder how the case of "present ASAP"
is actually implemented in hardware.
But again, all this is just from my naive understanding of the display
hardware.
Cheers,
Nicolai
>
> Regards,
> Aric
>
More information about the dri-devel
mailing list