Funky new vblank counter regressions in Linux 4.4-rc1

Mon Nov 23 07:51:09 PST 2015

On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
> > On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> 
> ...
> 
> >> What we do in radeon-kms is similar. If DRM_CALLED_FROM_VBLIRQ and we
> >> are no more than 1% of the display height away from start of vblank we
> >> fudge scanout position in a way so that the timestamp gets computed for
> >> the soon-to-begin vblank, not the old one.
> >
> > The problem with basing that fudging purely on DRM_CALLED_FROM_VBLIRQ is
> > that if you call .get_scanout_positon() from a non-irq context between
> > the irq firing and start of vblank, you'll think you're still in the
> > previous frame. Hence my suggestion to note down the frame counter when
> > called from the irq, and then keep doing the fudging until you've truly
> > crossed the start of vblank.
> >
> 
> Ok, but why would that be a bad thing? I think we want it to think it is 
> in the previous frame if it is called outside the vblank irq context. 
> The only reason we fudge it to the next frames vblank if i vblank irq is 
> because we know the vblank irq handler we are executing atm. was meant 
> to execute within the upcoming vblank for the next frame, so we fudge 
> the scanout positions and thereby timestamp to correspond to that new 
> frame. But if something called outside irq context it should get a 
> scanout position/timestamp that corresponds to "reality".

It would be a bad thing since it would cause the timestamp to jump
backwards, and that would also cause the frame count guesstimate to go
backwards.

> 
> It would maybe be a problem to get different answers for a query at the 
> same scanout position if we used .get_scanout_position() for something 
> else than for calculating vblank timestamps, but we don't do that atm.
> 
> Maybe i'm overlooking something here related to the somewhat rewritten 
> code from the last year or so? But in the original design this would be 
> exactly what i intended?
> 
> ...
> 
> >> So it's good enough for typical desktop
> >> applications/compositors/games/media players, and a nice improvement
> >> over the previous state, but not quite sufficient for applications that
> >> need long time consistent vblank counts for long waits of multiple
> >> seconds or even minutes. So it's still good to have hw counters if we
> >> can get some that are reliable enough.
> >
> > Ah, I didn't realize you care about small errors in the counter for
> > such long periods of vblank off.
> >
> 
> Actually, you are right, i was stupid - not enough sleep last friday. I 
> do care about such small errors, but the long vblank off periods don't 
> matter at all the way my software works. I query the current count and 
> timestamp (glXGetSyncValuesOML), calculate a target count based on those 
> and then schedule a swap for that target count via glXSwapBuffersMscOML. 
> That swapbuffers call will keep vblank irqs on until the kms pageflip is 
> queued. So i only care about vblank counts/timestamps being consistent 
> for short amounts of time, typically 1 video refresh from vblank query 
> to queueing a vblank event, and then from reception of that 
> event/queuing the pageflip to pageflip completion event. So even if the 
> system would be heavily loaded and my code would have big preemption 
> delays i think counts that are consistent over a few seconds would be 
> enough to keep things working. Otherwise it wouldn't work now either 
> with a vblank off after 5 seconds and nouveau not having vblank hw counters.
> 
> -mario
> 
> 
> >>
> >> -mario
> >>
> >>
> >>>>
> >>>> -mario
> >>>>
> >>>>>>
> >>>>>> It almost sort of works on the rs600 code path, but i need a bit of info
> >>>>>> from you:
> >>>>>>
> >>>>>> 1. There's this register from the old specs for m76.pdf, which is not
> >>>>>> part of the current register defines for radeon-kms:
> >>>>>>
> >>>>>> "D1CRTC_STATUS_VF_COUNT - RW - 32 bits - [GpuF0MMReg:0x60A8]"
> >>>>>>
> >>>>>> It contains the lower 16 bits of framecounter and the 13 bits of
> >>>>>> vertical scanout position. It seems to give the same readings as the 24
> >>>>>> bit R_0060A4_D1CRTC_STATUS_FRAME_COUNT we use for the hw counter. This
> >>>>>> would come handy.
> >>>>>>
> >>>>>> Does Evergreen and later have a same/similar register and where is it?
> >>>>>>
> >>>>>> 2. The hw framecounter seems to increment when the vertical scanout
> >>>>>> position wraps back from (VTOTAL - 1) to 0, at least on the one DCE-3
> >>>>>> gpu i tested so far. Is this so on all asics? And is the hw counter
> >>>>>> increment happening exactly at the moment that vertical scanout position
> >>>>>> jumps back to zero, ie. both events are driven by the same signal? Or is
> >>>>>> the framecounter increment just happening somewhere inside either
> >>>>>> scanline VTOTAL-1 or scanline 0?
> >>>>>>
> >>>>>>
> >>>>>> If we can fix this and get it into rc2 or rc3 then we could avoid a bad
> >>>>>> regression and with a bit of luck at the same time improve by being able
> >>>>>> to set dev->vblank_disable_immediate = true then and allow vblank irqs
> >>>>>> to get turned off more aggressively for a bit of extra power saving.
> >>>>>>
> >>>>>> thanks,
> >>>>>> -mario
> >>>>>
> >>>>>> -- a/drivers/gpu/drm/drm_irq.c
> >>>>>> +++ b/drivers/gpu/drm/drm_irq.c
> >>>>>> @@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>>>                                        unsigned long flags)
> >>>>>>     {
> >>>>>>            struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
> >>>>>> -       u32 cur_vblank, diff;
> >>>>>> +       u32 cur_vblank, diff = 0;
> >>>>>>            bool rc;
> >>>>>>            struct timeval t_vblank;
> >>>>>> +       const struct timeval *t_old;
> >>>>>> +       u64 diff_ns;
> >>>>>>            int count = DRM_TIMESTAMP_MAXRETRIES;
> >>>>>>            int framedur_ns = vblank->framedur_ns;
> >>>>>>
> >>>>>> @@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>>>                    rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
> >>>>>>            } while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);
> >>>>>>
> >>>>>> -       if (dev->max_vblank_count != 0) {
> >>>>>> -               /* trust the hw counter when it's around */
> >>>>>> -               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> >>>>>> -       } else if (rc && framedur_ns) {
> >>>>>> -               const struct timeval *t_old;
> >>>>>> -               u64 diff_ns;
> >>>>>> -
> >>>>>> +       /*
> >>>>>> +        * Always use vblank timestamping based method if supported to reject
> >>>>>> +        * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
> >>>>>> +        * due to some irq handling quirk.
> >>>>>
> >>>>> Hmm. I'm thinking it would be better to simply not claim that there's a hw
> >>>>> counter if it isn't reliable.
> >>>>>
> >>>>>> +        *
> >>>>>> +        * This also sets the diff value for use as fallback below in case the
> >>>>>> +        * hw does not support a suitable hw vblank counter.
> >>>>>> +        */
> >>>>>> +       if (rc && framedur_ns) {
> >>>>>>                    t_old = &vblanktimestamp(dev, pipe, vblank->count);
> >>>>>>                    diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);
> >>>>>>
> >>>>>> @@ -212,11 +216,28 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
> >>>>>>                     */
> >>>>>>                    diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);
> >>>>>>
> >>>>>> -               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
> >>>>>> -                       DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >>>>>> -                                     " diff_ns = %lld, framedur_ns = %d)\n",
> >>>>>> -                                     pipe, (long long) diff_ns, framedur_ns);
> >>>>>> -       } else {
> >>>>>> +               if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
> >>>>>> +                   DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
> >>>>>> +                   " diff_ns = %lld, framedur_ns = %d)\n",
> >>>>>> +                   pipe, (long long) diff_ns, framedur_ns);
> >>>>>> +
> >>>>>> +                   /* Treat this redundant invocation as no-op. */
> >>>>>> +                   WARN_ON_ONCE(cur_vblank != vblank->last);
> >>>>>> +                   return;
> >>>>>> +               }
> >>>>>> +       }
> >>>>>> +
> >>>>>> +       /*
> >>>>>> +        * hw counters, if marked as supported via max_vblank_count != 0,
> >>>>>> +        * *must* increment at leading edge of vblank and in sync with
> >>>>>> +        * the firing of the hw vblank irq, otherwise we have a race here
> >>>>>> +        * between gpu and us, where small variations in our execution
> >>>>>> +        * timing can cause off-by-one miscounting of vblanks.
> >>>>>> +        */
> >>>>>> +       if (dev->max_vblank_count != 0) {
> >>>>>> +               /* trust the hw counter when it's around */
> >>>>>> +               diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
> >>>>>> +       } else if (!(rc && framedur_ns)) {
> >>>>>>                    /* some kind of default for drivers w/o accurate vbl timestamping */
> >>>>>>                    diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
> >>>>>>            }
> >>>>>
> >>>>>
> >>>
> >

-- 
Ville Syrjälä
Intel OTC