[RFC] DRI2 synchronization and swap bits

Wed Nov 18 15:57:47 PST 2009

On Sun, 8 Nov 2009 08:16:51 +0100
Mario Kleiner <mario.kleiner at tuebingen.mpg.de> wrote:
> My proposal to use a spinlock was probably  rather stupid. Because
> of glXGetSyncValuesOML() -> I830DRI2GetMSC -> drmWaitVBlank ->  
> drm_wait_vblank -> drm_vblank_count(), if multiple clients call  
> glXGetSyncValuesOML() frequently, e.g., in a polling loop, i assume  
> this could cause quite a bit of contention on a spinlock that must
> be acquired with minimal delay from the vblank irq handler. According
> to <http://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO>, critical  
> sections protected by spinlock_t are preemptible if one uses a  
> realtime kernel with preempt-rt patches applied, something i'd
> expect my users to do frequently. Maybe i overlook something, but
> this sounds unhealthy if it happens while drm_vblank_count() holds
> the lock?

It's not ideal, but I don't think it would cause many problems in
practice.  The lock hold times will be very small, so contention
shouldn't be a big problem, but as you say if we find issues we can use
a lockless method.

> Btw., when looking at the code in drm_irq.c in the current
> linux-next tree, i saw that drm_handle_vblank_events() does a
> e->event.sequence = seq; assignment with the current seq vblank
> number when retiring an event, but the special shortcut path in
> drm_queue_vblank_event(), which retires events immediately without
> queuing them if the requested vblank number has been reached or
> exceeded already, does not do an update e->event.sequence = seq; with
> the most recent seq vblank number that triggered this early
> retirement. This looks inconsistent to me, could this be a bug?

It uses vblwait->request.sequence which should be the right number.
The timestamp is definitely off though...

> The simple seqlock implementation might be too simple though and a  
> ringbuffer that holds multiple hundred recent vblank timestamp  
> samples might be better.

I wonder if it would be sufficient to just track the last timestamp.
Any callers interested in the last event would get a precise
timestamp.  Callers looking at past events could be returned a
calculated value based on the time difference between the last two
events?

> The problem is the accuracy of glXGetMscRateOML(). This value -  
> basically the duration of a video refresh interval - gets calculated  
> from the current video mode timing, i.e., dotclock, HTotal and  
> VTotal. This value is only useful for userspace applications like my  
> toolkit under the assumption that both the dotclock of the GPU and  
> the current system clock (TSC / HPET / APIC timer / ...) are  
> perfectly accurate and drift-free. In reality, both clocks are  
> imperfect and drift against each other, therefore the returned  
> nominal value of glXGetMscRateOML() is basically always a bit wrong/ 
> inaccurate wrt. system time as used by userspace applications. Our  
> app therefore determines the "real" refresh duration by a
> calibration loop of multiple seconds duration at startup. This works
> ok, but it increases startup time, can't take slow clock drift over
> the course of a session into account, because i can't recalibrate
> during a session, and the calibration is also not perfect due to the
> timing noise (preemption, scheduling jitter, wakeup latency after  
> swapbuffers etc.) that affects a measurement loop in userspace.
> 
> A better approach would be for Linux to measure the current video  
> refresh interval over a certain time window, e.g., computing a
> moving average over a few seconds. This could be done if the vblank  
> timestamps are logged into a ringbuffer. The ringbuffer would allow  
> for lock-free readout of the most recent vblank timestamp from  
> drm_vblank_count(). At the same time the system could look at all  
> samples in the ringbuffer to compute the real duration of a video  
> refresh interval as a average over the deltas between samples in the  
> ringbuffer and provide an accurate and current estimate of  
> glXGetMscRateOML() that would be better than anything we can do in  
> userspace.

That would be even better than just using the last difference, and
shouldn't add too much more code.  On some configurations the refresh
rate will change too (for power saving reasons it may be reduced and
then increased again when activity occurs) so that would have to be
accounted for.  Presumably we wouldn't care about the reduced rate
since it implies clients aren't actively using the display.

> The second problem is how to reinitialize the current vblank  
> timestamp in drm_update_vblank_count() when vblank interrupts get  
> reenabled after they've been disabled for a long period of time?
> 
> One generic way to reinitialize would be to calculate elapsed time  
> since last known vblank timestamp from the computed vblank count  
> "diff" by multiplying the count with the known duration of the video  
> refresh interval. In that case, an accurate estimate of  
> glXGetMscRateOML would be important, so a ringbuffer with samples  
> would probably help.

Right, if we have the averaged difference we could just calculate it.

> Here is another proposal that i would love to see in Linux:
> 
> A good method to find the accurate time of the last vblank  
> immediately, irrespective of possible delays in irq dispatching, is  
> to use the current scanout position of the crtc as a high resolution  
> clock that counts time since start of the vertical blank interval.  
> Afaik basically all GPU's have a register that allows to read out
> the currently scanned out scanline. If the exact duration of a video  
> refresh interval 'refreshduration' is known by measurement, the  
> current 'scanline' is known by a register read, the total height of  
> the display vtotal is known, and one has a timestamp of current  
> system time tsystem from do_gettimeofday(), one can conceptually  
> implement this pseudo-code:
> 
> scanline = dev->driver->getscanline(dev, crtc);
> tsystem = do_gettimeofday(...);
> tvblank = tsystem - refreshduration * (scanline / vtotal);
> 
> (One has to measure scanline relative to the first line inside the  
> vblank area though, and the math might be a bit more complex, due to  
> the lack of floating point arithmetic in the kernel?).
> 
> This would allow to quickly reinitialize the vblank timestamp in  
> drm_update_vblank_count() and to provide vblank timestamps from  
> inside drm_handle_vblank() which are robust against interrupt  
> dispatch latency. It would require a new callback function into the  
> GPU specific driver, e.g., dev->driver->getscanline(dev, crtc). One  
> could have a simple do_gettimeofday(&tvblank) as fallback for
> drivers that don't implement the new callback. Maybe one could even
> implement a dev->driver->getvblanktime(dev, crtc); callback that
> executes the above lines inside one function and optionally allows
> use of more clever GPU specific strategies like GPU internal clocks
> and snapshot registers?

Seems worth prototyping at least.

-- 
Jesse Barnes, Intel Open Source Technology Center