Funky new vblank counter regressions in Linux 4.4-rc1

Fri Nov 27 06:55:56 PST 2015

On Wed, Nov 25, 2015 at 02:38:04PM -0500, Alex Deucher wrote:
> On Wed, Nov 25, 2015 at 1:21 PM, Mario Kleiner
> <mario.kleiner.de at gmail.com> wrote:
> > On 11/25/2015 06:58 PM, Ville Syrjälä wrote:
> >>
> >> On Wed, Nov 25, 2015 at 06:24:13PM +0100, Mario Kleiner wrote:
> >>>
> >>> On 11/23/2015 09:24 PM, Ville Syrjälä wrote:
> >>>>
> >>>> On Mon, Nov 23, 2015 at 06:58:34PM +0100, Mario Kleiner wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 11/23/2015 04:51 PM, Ville Syrjälä wrote:
> >>>>>>
> >>>>>> On Mon, Nov 23, 2015 at 04:23:21PM +0100, Mario Kleiner wrote:
> >>>>>>>
> >>>>>>> On 11/20/2015 04:34 PM, Ville Syrjälä wrote:
> >>>>>>>>
> >>>>>>>> On Fri, Nov 20, 2015 at 04:24:50PM +0100, Mario Kleiner wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>> ...
> >>>>>>> Ok, but why would that be a bad thing? I think we want it to think it
> >>>>>>> is
> >>>>>>> in the previous frame if it is called outside the vblank irq context.
> >>>>>>> The only reason we fudge it to the next frames vblank if i vblank irq
> >>>>>>> is
> >>>>>>> because we know the vblank irq handler we are executing atm. was
> >>>>>>> meant
> >>>>>>> to execute within the upcoming vblank for the next frame, so we fudge
> >>>>>>> the scanout positions and thereby timestamp to correspond to that new
> >>>>>>> frame. But if something called outside irq context it should get a
> >>>>>>> scanout position/timestamp that corresponds to "reality".
> >>>>>>
> >>>>>>
> >>>>>> It would be a bad thing since it would cause the timestamp to jump
> >>>>>> backwards, and that would also cause the frame count guesstimate to go
> >>>>>> backwards.
> >>>>>>
> >>>>>
> >>>>> But only if we don't use the dev->driver->get_vblank_counter() method,
> >>>>> which we try to use on AMD.
> >>>>
> >>>>
> >>>> Well, if you do it that way then you have the problem of the hw counter
> >>>> seeming to jump forward by one after crossing the start of vblank (when
> >>>> compared to the value you sampled when you processed the early vblank
> >>>> interrupt).
> >>>>
> >>>
> >>> Ok, finally i see the bad scenario that wouldn't get prevented by our
> >>> current locking with the new vblank counting in the core. The vblank
> >>> enable path is safe due to locking and discounting of redundant
> >>> timestamps etc. But the disable path could go wrong:
> >>>
> >>> 1. Vblank irq fires, drm_handle_vblank() -> drm_update_vblank_count(),
> >>> updates timestamps and counts "as if" in vblank -> incremented vblank
> >>> count and timestamp now set in the future.
> >>>
> >>> 2. After vblank irq finishes, but just before leading edge of vblank,
> >>> vblank_disable_and_save() executes, doesn't get bumped timestamp or
> >>> count because before vblank and not in vblank irq. Now
> >>> drm_update_vblank_count() would process a
> >>> "new" timestamp and count from the past and we'd have time and counts
> >>> going backwards, and bad things would happen.
> >>>
> >>> I haven't observed such a thing happening during testing so far,
> >>> probably because the time window in which it could happen is tiny, but
> >>> given how awfully bad it would be, it needs to be prevented.
> >>>
> >>> I had a look at the description of the Vblank irq in the "M76 Register
> >>> Reference Guide" for older asics and the description suggests that the
> >>> vblank irq fires when the crtc's line buffer is finished reading pixel
> >>> data from the scanout buffer in memory for a frame, ie., when the line
> >>> buffer read "enters" vblank.
> >>
> >>
> >> Hmm. Does that mean there's always at least one fullscreen plane enabled
> >> in the hw? As in you can't turn off the primary plane or make it smaller
> >> than the active video area? Othwewise it sounds like you'd could either
> >> not get it at all, or get it somewhere in the middle of the screen.
> >>
> >
> > It says "Interrupt that can be programmed to be generated by the
> > primary display controller's line buffer logic either when the
> > source image line counter is not requesting any active
> > display data (i.e. in the vertical blank) or the output CRTC
> > timing generator is within the vertical blanking region."
> >
> > So my statements were my interpretation of this quote, so i can make some
> > sense out of the vblank irq behaviour. I guess Alex or Harry would know? The
> > M76 reference refers to some older asics, i just assume it is the same for
> > the current ones, given that observed behaviour would be consistent with the
> > line buffer causing this lead of a couple of scanlines. I see about 2
> > scanlines on DCE4 and about 3 scanlines on DCE3. I don't know how big the
> > line buffer is, how quickly it refills etc., but it sounds reasonable.
> 
> The size of the line buffer varies by generation, but the LB logic is
> still responsible for generating the vblank interrupt even on newer
> hw.

So there's no actual vblank interrupt available from the timing
generator?

-- 
Ville Syrjälä
Intel OTC