Improvements to VRR below-the-range/low framerate compensation.

Thu Apr 18 03:51:18 UTC 2019

Hi

This patch-series tries to improve amdgpu's below-the-range
behaviour with Freesync, hopefully not only for my use case,
but also for games etc.

Patch 1/4 adds a bit of debug output i found very useful, so maybe
worth adding?

Patch 2/4 fixes a bug i found when reading over the freesync code.

Patches 3/4 and 4/4 are optimizations to improve stability
in BTR.

My desired application of VRR for neuroscience/vision research
is to control the timing of when frames show up onscreen, e.g.,
to show animations at different "unconventional" framerates,
so i'm mostly interested in how well one can control the timing
between successive OpenGL bufferswaps. This is a bit different
from what a game wants to get out of VRR, probably closer to
what a movie player might want to do.

I spent quite a bit of time testing how FreeSync behaves when
flipping at a rate below the displays VRR minimum refresh rate.

For that, my own application was submitting glXSwapBuffers() flip
requests at different fps rates / time delays between successive
flips. The test script is for GNU/Octave + psychtoolbox-3, but
in principle the C equivalent would be this pseudo-code:

for (i = 0; i < n; i++) {
  // Wait for pending flip to complete, get pageflip timestamp t_last
  // of flip completion:
  glXWaitForSbcOML(...., &t_last[i],...);

  // Fetch some delay value until next flip tdelay[i]
  tdelay[i] = Some function of varying frame delay over samples.

  // Try to flip tdelay[i] secs after previous flip t_last[i]:
  t_next = t_last[i] + tdelay[i];
  clock_nanosleep(t_next);

  // Flip
  glXSwapBuffers(...);
}

For tdelay[i] i used different test profiles, e.g., on a display with
a VRR range from 30 Hz to 144 Hz ~ 7 msecs - 33 msecs:

tdelay[i] = 0.050 // One flip each 50 msecs, want constant 20 fps.
tdelay[i] = rand() // Some randomly chosen delay for each flip.
tdelay[i] = 0.007 + some sin() sine profile of changing delays/fps.
tdelay[i] = 0.007 + i * 0.001; linear increase in delay by 1 msec/flip,
            starting at 7 msecs.
tdelay[i] = ... linear decrease by 1 msec/flip.. starting at 120 msecs.

etc. Then i plotted requested flip delays tdelay[] against actual
flip delays (~ t_last[i+1] - t_last[i]) to see how well VRR can follow
requested fps.

Inside the VRR range ~ 7 msecs - 33 msecs, Freesync behaved basically
perfect with average errors of less than 0.1 msecs and jitter of less
than 1 msec.

When going for tdelay's > 33 msecs, ie. when low framerate compensation/
BTR kicked in, my DCN-1 Raven Ridge APU behaved almost as well as within
the VRR range (for reasonably smooth changes in fps). When doing the
same on a DCE-8 and DCE-11 gpu, BTR made much bigger errors between what
was requested and what was measured.

Patch 3/4 helps avoiding glitches on DCN when transitioning from VRR range
to below min VRR range, and helps even more in avoiding glitches on DCE.

Patch 4/4 tries to improve behaviour on pre-DCE12. It helps quite a lot
when testing on DCE-8 and DCE-11 as described in the commit message. This
makes sense, as the code has some TODO comment about the different hw
behaviour of pre-DCE12, mentioning that pre-DCE12 hw only responds to 
programming different VTOTAL_MIN/MAX values with a lag of 1 frame. The
patch tries to work around this hardware limitation with some success.
DCE8/11 behaviour is still not as good as DCN-1 behaviour, but at least
it is not totally useless for my type of application with this patchset.

I don't have a Vega discrete gpu with DCE-12, but assume it would behave
like DCN-1 if the comments in the code are correct? Therefore the patch
switches BTR handling for < AMDGPU_FAMILY_AI vs. >= AMDGPU_FAMILY_AI.

Thanks,
-mario