DRM Vblank timestamping patches. 1st Try.

Sun Oct 24 12:56:44 PDT 2010

Hi all,

after obsessing over these for another month after xds 2010,
this is the first version of my vblank timestamping patches that
i'm not totally ashamed off.

1st patch modifies the drm vblank handling to maintain precise
timestamps of when vblanks happen. It timestamps the end of
each vblank interval (= start of active scanout of the following
frame), as required by the OpenML OML_sync_control extension spec,
so the DRI2 swap & sync bits should return proper timestamps to
gl clients. It also takes care of keeping timestamps and vblcounts
consistent across vblank irq on/off and hopefully fixes a few
off-by-one vblank counts due to race conditions in those on/off bits.

There is still one small race condition between the vblank on/off
bits and the gpu (updating its hardware vblank counter at the
wrong moment) which can cause miscounting of vblanks across
enable/disable. I have an idea how to fix that, based on these
patches but will need to tinker around with it a bit more. After
that we could reduce the vblankoffdelay from 5000 msecs to something
smaller like 100 msecs to save more power on desktops with
desktop composition enabled, where already a blinking cursor
in a terminal window or a ticking clock in the menu bar
can keep vblanks turned on all the time with a 5000 msecs timeout.

All the timestamping inside the drmWaitVblank ioctl() and in
vblank event handling now uses the timestamps calculated in this
patch.

2nd and 3rd patches add the new timestamping and scanout position
functions to the i915 and radeon drivers.

Wrt. what i showed you at xds, i made the following changes:

I've moved the actual timestamping routine as a helper function
into the core. kms drivers now optionally export the crtc scanout
position query function in addition to the vblank timestamping function
to the drm core. Whenever the core has to recalculate timestamps it
goes like:

core -> kmsdriver-timestamping() -> core-timestampinghelper() ->
kmsdriver-getscanoutposition()

The kms driver can implement its own version of the timestamping function,
e.g., using special hardware timestamping registers in some more recent
gpu's, or it can simply export a getscanoutposition() function with
standardized behaviour across drivers and call the timestamping routine
implemented in the core. The routine in the core works in non-interlaced,
non-doublescan video modes, with or without panel scaling. It can
compensate for any random delay up to 1 video refresh duration. For
more robustness one would need to use a gpu specific hardware method.

I also moved the calculation of the constants needed for timetamping
into the drm_crtc_helper routine, so they only get recalculated after
a mode switch, not at each vblank. At modeset time i cache the
adjusted_mode after validation/adjustment by the crtc and encoders
inside the drm_crtc struct. Needed to account for panel scaling, and
i have a few other ideas that would require that cached value.

More details in the different commit messages and the code, just
some more remarks.

@Jesse: I replaced the n extra dynamically allocated spin-locks for n
crtc's by one statically allocated lock shared by all crtc's and renamed
it. I think i also worked all your other suggestions into the patches.
Thanks a lot for your first review of the earlier version.

The intel driver uses the timestamps inside the finish pageflip function
for timestamps of bufferswap completion. I've tested with an Atom
Mini netbook with Intel 945 GME gpu and there it works perfect. I don't
have any other intel test machines, so the finish pageflip and scanout
query functions for ironlake, gen6 and g4x aren't tested, only coded
based on some of the intel manuals at xorg. In the pageflip irq
handlers there is this uncertainty if pageflip irq's get delivered
before or after the vblank irq's for the vblank interval of swap
completion. The i945 delivers pagflip irq's before handling the
vblank irq's so the completion handler needs to account for that.
I guessed from the code, which ones do deliver before/after vblank,
but that guesses may be wrong, so this needs testing.

@Alex: Thanks for your feedback at xds. I think the radeon part
should be as we discussed. The radeon patch builds onto what you
committed already into drm-radeon-testing for the reclocking fixes.
It slightly changes the interface between the scanout query function and
radeon_pm_in_vbl(), but doesn't change the implementation.

Testing of the timestamps on the Intel 945GME, a Radeon R500 and a R600
against external measurement equipment shows that the timestamps are
essentially perfect. Error wrt. to the external measurement is less
than 20 microseconds (for the radeon's) and jitter is +/- 1 usec. The
intel error is less than 0.5 msecs, but i can't test how much better
because my photo-diode can't do better and my usec precise equipment only
works with DVI, but the intel test machine only has a vga output.

Now this swapbuffers timestamping is only trustworthy for pageflipped
swaps, ie. at the moment only on intel. Alex is working on pageflipping
support for radeon and i can test those bits and add timestamping.
For the swap by blitting case, more work needs to be done.

My current test scripts are written as matlab/octave scripts against my
toolkit (http://www.psychtoolbox.org). I could prepare a little zip
file with the minimum bits needed to run those under gnu/octave.
I'll translate the most essential tests into c in the next couple of days
if there is some kind of test suite for this?

Thanks,
-mario