Funky new vblank counter regressions in Linux 4.4-rc1
Mario Kleiner
mario.kleiner.de at gmail.com
Fri Nov 27 06:09:51 PST 2015
On 11/25/2015 08:36 PM, Ville Syrjälä wrote:
> On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote:
>> On 11/25/2015 06:46 PM, Ville Syrjälä wrote:
...
>> Attached is my current patch i wanted to submit for the drm core's
>> drm_update_vblank_count(). I think it's good to make the core somewhat
>> robust against potential kms driver bugs or glitches. But if you
>> wouldn't like that patch, there wouldn't be much of a point sending it
>> out at all.
>>
>> thanks,
>> -mario
>>
>
>> >From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001
>> From: Mario Kleiner <mario.kleiner.de at gmail.com>
>> Date: Wed, 25 Nov 2015 18:48:31 +0100
>> Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust.
>>
>> The changes to drm_update_vblank_count() for Linux 4.4-rc
>> made the function more fragile wrt. some hw quirks. E.g.,
>> at dev->driver->enable_vblank(), AMD gpu's fire a spurious
>> redundant vblank irq shortly after enabling vblank irqs, not
>> locked to vblank. This causes a redundant call which needs
>> to be suppressed to avoid miscounting.
>>
>> To increase robustness, shuffle things around a bit:
>>
>> On drivers with high precision vblank timestamping always
>> evaluate the timestamp difference between current timestamp
>> and previously recorded timestamp to detect such redundant
>> invocations and no-op in that case.
>>
>> Also detect and warn about timestamps going backwards to
>> catch potential kms driver bugs.
>>
>> This patch is meant for Linux 4.4-rc and later.
>>
>> Signed-off-by: Mario Kleiner <mario.kleiner.de at gmail.com>
>> ---
>> drivers/gpu/drm/drm_irq.c | 53 ++++++++++++++++++++++++++++++++++++-----------
>> 1 file changed, 41 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index 819b8c1..8728c3c 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>> unsigned long flags)
>> {
>> struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
>> - u32 cur_vblank, diff;
>> + u32 cur_vblank, diff = 0;
>> bool rc;
>> struct timeval t_vblank;
>> + const struct timeval *t_old;
>> + u64 diff_ns;
>> int count = DRM_TIMESTAMP_MAXRETRIES;
>> int framedur_ns = vblank->framedur_ns;
>>
>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>> rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
>> } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
>>
>> - if (dev->max_vblank_count != 0) {
>> - /* trust the hw counter when it's around */
>> - diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>> - } else if (rc && framedur_ns) {
>> - const struct timeval *t_old;
>> - u64 diff_ns;
>> -
>> + /*
>> + * Always use vblank timestamping based method if supported to reject
>> + * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
>> + * due to some irq handling quirk.
>> + *
>> + * This also sets the diff value for use as fallback below in case the
>> + * hw does not support a suitable hw vblank counter.
>> + */
>> + if (rc && framedur_ns) {
>
> If you fudged everything properly why do you still need this? With
> working hw counter there should be no need to do this stuff.
>
As far as testing on one DCE4 card goes, i don't need it anymore with my
fudged hw counters and timestamps. The fudging so far seems to work
nicely. I just wanted to have a bit of extra robustness and a bit of
extra available debug output against future or other broken drivers, or
mistakes in fudging on the current driver, e.g., against things like
timestamps going backwards. Especially since i can only test on two AMD
cards atm., quite a limited sample. There are 3 display engine
generations before and 5 generations after my test sample.
-mario
>> t_old = &vblanktimestamp(dev, pipe, vblank->count);
>> diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
>>
>> @@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
>> */
>> diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
>>
>> - if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
>> + /* Catch driver timestamping bugs and prevent worse. */
>> + if ((s32) diff < 0) {
>> + DRM_DEBUG_VBL("crtc %u: Timestamp going backward!"
>> + " diff_ns = %lld, framedur_ns = %d)\n",
>> + pipe, (long long) diff_ns, framedur_ns);
>> + return;
>> + }
>> +
>> + if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
>> DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
>> - " diff_ns = %lld, framedur_ns = %d)\n",
>> - pipe, (long long) diff_ns, framedur_ns);
>> - } else {
>> + " diff_ns = %lld, framedur_ns = %d)\n",
>> + pipe, (long long) diff_ns, framedur_ns);
>> +
>> + /* Treat this redundant invocation as no-op. */
>> + WARN_ON_ONCE(cur_vblank != vblank->last);
>> + return;
>> + }
>> + }
>> +
>> + /*
>> + * hw counters, if marked as supported via max_vblank_count != 0,
>> + * *must* increment at leading edge of vblank or at least before
>> + * the firing of the hw vblank irq, otherwise we have a race here
>> + * between gpu and us, where small variations in our execution
>> + * timing can cause off-by-one miscounting of vblanks.
>> + */
>> + if (dev->max_vblank_count != 0) {
>> + /* trust the hw counter when it's around */
>> + diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
>> + } else if (!(rc && framedur_ns)) {
>> /* some kind of default for drivers w/o accurate vbl timestamping */
>> diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
>> }
>> --
>> 1.9.1
>>
>
>
More information about the dri-devel
mailing list