[RFC] Reduce idle vblank wakeups

Wed Nov 16 18:19:08 PST 2011

On Thu, Nov 17, 2011 at 01:26:37AM +0100, Mario Kleiner wrote:
> On Nov 16, 2011, at 7:48 PM, Matthew Garrett wrote:
> >I'll admit that I'm struggling to understand the issue here. If the
> >vblank counter is incremented at the time of vblank (which isn't the
> >case for radeon, it seems, but as far as I can tell is the case for
> >Intel) then how does ping-ponging the IRQ matter?
> >vblank_disable_and_save() appears to handle this case.
> >
> 
> The drm software vblank counter which is used for scheduling swaps
> etc. gets incremented in the gpu's vblank irq handler. The gpu's
> hardware counter gets incremented somewhere inside the vblank
> interval. The increments don't happen at the same point in time. The
> race is between the vblank on/off code, which gets scheduled either
> by the timeout timer for the "off" case, or by usercode for the "on"
> case and the gpu's hardware vblank counter.

Yes, that makes sense given the current design.

> The existing code uses a lock (vblank_time_lock) to prevent some
> races between itself and the vblank irq handler, but it can't "lock"
> the gpu, so if the enable/disable code executes while the gpu is
> scanning out the vblank interval, it is undefined if the final
> vblank count will be correct or off by one. Vblank duration is
> somewhere up to 4.5% of refresh interval duration, so there's your
> up to 4% chance of getting it wrong.

Well presumably by "undefined" we really mean "hardware-specific" - for 
any given well-functioning hardware I'd expect the answer to be 
well-defined. Like I said, with Radeon we have the option of triggering 
an interrupt at the point where the hardware counter is actually 
incremented. If that then shuts down the vblank interrupt then we should 
have a well-defined answer.

> >Does the timeout serve any purpose other than letting software
> >effectively prevent vblanks from being disabled?
> 
> With perfect drivers and gpu's in a perfect world, no. In reality
> there's the race i described above, and nouveau and all other
> drivers except intel and radeon. The vblank irq also drives
> timestamping of vblanks, one update per vblank. The timestamps are
> cached if a client needs them inbetween updates. Turning off vblank
> irq invalidates the timestamps. radeon and intel can recreate the
> timestamp anytime as needed, but nouveau lacks this atm., so
> timestamps remain invalid for a whole video refresh cycle after
> vblank irq on. We have patches for nouveau kms almost ready, so only
> the race mentioned above would remain.

Sure. We'd certainly need to improve things for other GPUs before 
enabling the same functionality there.

> >For Radeon, I'd have thought you could handle this by scheduling
> >an irq
> >for the beginning of scanout (avivo has a register for that) and
> >delaying the vblank disable until you hit it?
> 
> For Radeon there is such an irq, but iirc we had some discussions on
> this, also with Alex Deucher, a while ago and some irq's weren't
> considered very reliable, or already used for other stuff. The idea
> i had goes like this:
> 
> Use the crtc scanout position queries together with the vblank
> counter queries inside some calibration loop, maybe executed after
> each modeset, to find out the scanline range in which the hardware
> vblank counter increments -- basically a forbidden range of scanline
> positions where the race would happen. Then at each vblank off/on,
> query scanout position before and after the hw vblank counter query.
> If according to the scanout positions the vblank counter query
> happened within the forbidden time window, retry the query. With a
> well working calibration that should add no delay in most cases and
> a delay to the on/off code of a few dozen microseconds (=duration of
> a few scanlines) worst case.

Assuming we're sleeping rather than busy-looping, that's certainly ok. 
My previous experiments with radeon indicated that the scanout irq was 
certainly not entirely reliable - on the other hand, I was trying to use 
it for completing memory reclocking within the vblank interval. It was 
typically still within a few scanlines, so a sanity check there wouldn't 
pose too much of a problem.

> >I've no problem with all of this work being case by case.
> 
> Me neither. I just say if you'd go to the extreme and disable vblank
> irq's immediately with zero delay, without reliably fixing the
> problem i mentioned, you'd get those off by one counts, which would
> be very bad for apps that rely on precise timing. Doing so would
> basically make the whole oml_sync_control implementation mostly
> useless.

Right. My testing of sandybridge suggests that there wasn't a problem 
here - even with the ping-ponging I was reliably getting 60 interrupts 
in 60 seconds, with the counter incrementing by 1 each time. I certainly 
wouldn't push to enable it elsewhere without making sure that the 
results are reliable.

> >If vblanks are disabled and then re-enabled between 1 and 3,
> >what's the
> >negative outcome?
> 
> 1. glXGetSyncValuesOML() gives you the current vblank count and a
> timestamp of when exactly scanout of the corresponding video frame
> started. These values are the baseline for any vblank based swap
> scheduling or any other vblank synchronized actions, e.g., via
> glXWaitForMscOML().
> 
> 2. App calculates stuff based on 1. but gets preempted or
> experiences scheduling delay on return from 1. - or the x-server
> gets preempted or scheduled away. Whatever, time passes...
> 
> 3. glXSwapBuffersMscOML() or glXWaitForMscOML() is used to schedule
> something to occur at a target vblank count, based on the results of
> 1.
> 
> As long as vblanks don't get disabled between 1 and 3, everything is
> consistent. Scheduling delays/preemption of more than a few (dozen)
> milliseconds worst case are very rare, so a lowered vblank off delay
> of even one or two refresh cycles would solve the problem. I assume
> that most toolkits with needs for timing precision would follow a
> three step strategy like this one.
> 
> If vblank irq's get disabled immediately after step 1. and reenabled
> at step 3., and this triggers an off by one error due to a race,
> then the calculated target vblank count from steps 1/2 is invalid
> and useless for step 3 and many vision scientists which just started
> to make friendship with Linux lately will make very sad faces. Due
> to the nature of many of the stimulation paradigms used, there is
> also a high chance that many of the enables and disables would
> happen inside or very close to the problematic vblank interval when
> off by one error is most likely.

Ok, so as long as we believe that we're reliably reading the hardware 
counter and not getting off by one, we should be ok? I just want to be 
clear that this is dependent on hardware behaviour rather than being 
absolutely inherent :)

> I agree with your patches, i just don't want vblank irq's to be
> disabled immediately with a zero timeout before remaining races are
> fixed, that would sure break things at least for scientific
> applications. And i'd like to keep some optional parameter that
> allows desparate users to disable the vblank off mechanism in case
> they would encounter a bug.

Yeah, no problem. I understand that completely.

-- 
Matthew Garrett | mjg59 at srcf.ucam.org