Funky new vblank counter regressions in Linux 4.4-rc1

Mario Kleiner mario.kleiner.de at gmail.com
Fri Nov 27 06:09:51 PST 2015


On 11/25/2015 08:36 PM, Ville Syrjälä wrote:
> On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote:
>> On 11/25/2015 06:46 PM, Ville Syrjälä wrote:

...

>> Attached is my current patch i wanted to submit for the drm core's
>> drm_update_vblank_count(). I think it's good to make the core somewhat
>> robust against potential kms driver bugs or glitches. But if you
>> wouldn't like that patch, there wouldn't be much of a point sending it
>> out at all.
>>
>> thanks,
>> -mario
>>
>
>> >From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001
>> From: Mario Kleiner <mario.kleiner.de at gmail.com>
>> Date: Wed, 25 Nov 2015 18:48:31 +0100
>> Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust.
>>
>> The changes to drm_update_vblank_count() for Linux 4.4-rc
>> made the function more fragile wrt. some hw quirks. E.g.,
>> at dev->driver->enable_vblank(), AMD gpu's fire a spurious
>> redundant vblank irq shortly after enabling vblank irqs, not
>> locked to vblank. This causes a redundant call which needs
>> to be suppressed to avoid miscounting.
>>
>> To increase robustness, shuffle things around a bit:
>>
>> On drivers with high precision vblank timestamping always
>> evaluate the timestamp difference between current timestamp
>> and previously recorded timestamp to detect such redundant
>> invocations and no-op in that case.
>>
>> Also detect and warn about timestamps going backwards to
>> catch potential kms driver bugs.
>>
>> This patch is meant for Linux 4.4-rc and later.
>>
>> Signed-off-by: Mario Kleiner <mario.kleiner.de at gmail.com>
>> ---
>>   drivers/gpu/drm/drm_irq.c | 53 ++++++++++++++++++++++++++++++++++++-----------
>>   1 file changed, 41 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index 819b8c1..8728c3c 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>   				    unsigned long flags)
>>   {
>>   	struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>> -	u32 cur_vblank, diff;
>> +	u32 cur_vblank, diff = 0;
>>   	bool rc;
>>   	struct timeval t_vblank;
>> +	const struct timeval *t_old;
>> +	u64 diff_ns;
>>   	int count = DRM_TIMESTAMP_MAXRETRIES;
>>   	int framedur_ns = vblank->framedur_ns;
>>
>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>   		rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>>   	} while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
>>
>> -	if (dev->max_vblank_count != 0) {
>> -		/* trust the hw counter when it's around */
>> -		diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>> -	} else if (rc && framedur_ns) {
>> -		const struct timeval *t_old;
>> -		u64 diff_ns;
>> -
>> +	/*
>> +	 * Always use vblank timestamping based method if supported to reject
>> +	 * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
>> +	 * due to some irq handling quirk.
>> +	 *
>> +	 * This also sets the diff value for use as fallback below in case the
>> +	 * hw does not support a suitable hw vblank counter.
>> +	 */
>> +	if (rc && framedur_ns) {
>
> If you fudged everything properly why do you still need this? With
> working hw counter there should be no need to do this stuff.
>

As far as testing on one DCE4 card goes, i don't need it anymore with my 
fudged hw counters and timestamps. The fudging so far seems to work 
nicely. I just wanted to have a bit of extra robustness and a bit of 
extra available debug output against future or other broken drivers, or 
mistakes in fudging on the current driver, e.g., against things like 
timestamps going backwards. Especially since i can only test on two AMD 
cards atm., quite a limited sample. There are 3 display engine 
generations before and 5 generations after my test sample.

-mario


>>   		t_old = &vblanktimestamp(dev, pipe, vblank->count);
>>   		diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
>>
>> @@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>>   		 */
>>   		diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>>
>> -		if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
>> +		/* Catch driver timestamping bugs and prevent worse. */
>> +		if ((s32) diff < 0) {
>> +			DRM_DEBUG_VBL("crtc %u: Timestamp going backward!"
>> +			" diff_ns = %lld, framedur_ns = %d)\n",
>> +			pipe, (long long) diff_ns, framedur_ns);
>> +			return;
>> +		}
>> +
>> +		if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
>>   			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>> -				      " diff_ns = %lld, framedur_ns = %d)\n",
>> -				      pipe, (long long) diff_ns, framedur_ns);
>> -	} else {
>> +			" diff_ns = %lld, framedur_ns = %d)\n",
>> +			pipe, (long long) diff_ns, framedur_ns);
>> +
>> +			/* Treat this redundant invocation as no-op. */
>> +			WARN_ON_ONCE(cur_vblank != vblank->last);
>> +			return;
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * hw counters, if marked as supported via max_vblank_count != 0,
>> +	 * *must* increment at leading edge of vblank or at least before
>> +	 * the firing of the hw vblank irq, otherwise we have a race here
>> +	 * between gpu and us, where small variations in our execution
>> +	 * timing can cause off-by-one miscounting of vblanks.
>> +	 */
>> +	if (dev->max_vblank_count != 0) {
>> +		/* trust the hw counter when it's around */
>> +		diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>> +	} else if (!(rc && framedur_ns)) {
>>   		/* some kind of default for drivers w/o accurate vbl timestamping */
>>   		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>>   	}
>> --
>> 1.9.1
>>
>
>


More information about the dri-devel mailing list