[Intel-gfx] [PATCH] drm/vblank: Avoid storing a timestamp for the same frame twice
Daniel Vetter
daniel at ffwll.ch
Mon Feb 8 17:43:53 UTC 2021
On Mon, Feb 8, 2021 at 5:58 PM Ville Syrjälä
<ville.syrjala at linux.intel.com> wrote:
>
> On Mon, Feb 08, 2021 at 10:56:36AM +0100, Daniel Vetter wrote:
> > On Fri, Feb 05, 2021 at 11:19:19PM +0200, Ville Syrjälä wrote:
> > > On Fri, Feb 05, 2021 at 06:24:08PM +0200, Ville Syrjälä wrote:
> > > > On Fri, Feb 05, 2021 at 04:46:27PM +0100, Daniel Vetter wrote:
> > > > > On Thu, Feb 04, 2021 at 05:55:28PM +0200, Ville Syrjälä wrote:
> > > > > > On Thu, Feb 04, 2021 at 04:32:16PM +0100, Daniel Vetter wrote:
> > > > > > > On Thu, Feb 04, 2021 at 04:04:00AM +0200, Ville Syrjala wrote:
> > > > > > > > From: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > > > > > > >
> > > > > > > > drm_vblank_restore() exists because certain power saving states
> > > > > > > > can clobber the hardware frame counter. The way it does this is
> > > > > > > > by guesstimating how many frames were missed purely based on
> > > > > > > > the difference between the last stored timestamp vs. a newly
> > > > > > > > sampled timestamp.
> > > > > > > >
> > > > > > > > If we should call this function before a full frame has
> > > > > > > > elapsed since we sampled the last timestamp we would end up
> > > > > > > > with a possibly slightly different timestamp value for the
> > > > > > > > same frame. Currently we will happily overwrite the already
> > > > > > > > stored timestamp for the frame with the new value. This
> > > > > > > > could cause userspace to observe two different timestamps
> > > > > > > > for the same frame (and the timestamp could even go
> > > > > > > > backwards depending on how much error we introduce when
> > > > > > > > correcting the timestamp based on the scanout position).
> > > > > > > >
> > > > > > > > To avoid that let's not update the stored timestamp unless we're
> > > > > > > > also incrementing the sequence counter. We do still want to update
> > > > > > > > vblank->last with the freshly sampled hw frame counter value so
> > > > > > > > that subsequent vblank irqs/queries can actually use the hw frame
> > > > > > > > counter to determine how many frames have elapsed.
> > > > > > >
> > > > > > > Hm I'm not getting the reason for why we store the updated hw vblank
> > > > > > > counter?
> > > > > >
> > > > > > Because next time a vblank irq happens the code will do:
> > > > > > diff = current_hw_counter - vblank->last
> > > > > >
> > > > > > which won't work very well if vblank->last is garbage.
> > > > > >
> > > > > > Updating vblank->last is pretty much why drm_vblank_restore()
> > > > > > exists at all.
> > > > >
> > > > > Oh sure, _restore has to update this, together with the timestamp.
> > > > >
> > > > > But your code adds such an update where we update the hw vblank counter,
> > > > > but not the timestamp, and that feels buggy. Either we're still in the
> > > > > same frame, and then we should story nothing. Or we advanced, and then we
> > > > > probably want a new timestampt for that frame too.
> > > >
> > > > Even if we're still in the same frame the hw frame counter may already
> > > > have been reset due to the power well having been turned off. That is
> > > > what I'm trying to fix here.
> > > >
> > > > Now I suppose that's fairly unlikely, at least with PSR which probably
> > > > does impose some extra delays before the power gets yanked. But at least
> > > > theoretically possible.
> > >
> > > Pondering about this a bit further. I think the fact that the current
> > > code takes the round-to-closest approach I used for the vblank handler
> > > is perhaps a bit bad. It could push the seq counter forward if we're
> > > past the halfway point of a frame. I think that rounding behaviour
> > > makes sense for the irq since those tick steadily and so allowing a bit
> > > of error either way seems correct to me. Perhaps round-down might be
> > > the better option for _restore(). Not quites sure, need more thinking
> > > probably.
> >
> > Yes this is the rounding I'm worried about.
>
> Actually I don't think this is really an issue since we are working
> with the corrected timestamps here. Those always line up with
> frames, so unless the correction is really buggy or the hw somehow
> skips a partial frame it should work rather well. At least when
> operating with small timescales. For large gaps the error might
> creep up, but I don't think a small error in the predicted seq
> number over a long timespan is really a problem.
That corrected timestamp is what can go wrong I think: There's no
guarantee that drm_crtc_vblank_helper_get_vblank_timestamp_internal()
flips to top-of-frame at the exact same time than when the hw vblank
counter flips. Or at least I'm not seeing where we correct them both
together.
> > But your point above that the hw might reset the counter again is also
> > valid. I'm assuming what you're worried about is that we first do a
> > _restore (and the hw vblank counter hasn't been trashed yet), and then in
> > the same frame we do another restore, but now the hw frame counter has
> > been trashe, and we need to update it?
>
> Yeah, although the pre-trashing _restore could also just be
> a vblank irq I think.
>
> >
> > > Another idea that came to me now is that maybe we should actually just
> > > check if the current hw frame counter value looks sane, as in something
> > > like:
> > >
> > > diff_hw_counter = current_hw_counter-stored_hw_counter
> > > diff_ts = (current_ts-stored_ts)/framedur
> > >
> > > if (diff_hw_counter ~= diff_ts)
> > > diff = diff_hw_counter;
> > > else
> > > diff = diff_ts;
> > >
> > > and if they seem to match then just keep trusting the hw counter.
> > > So only if there's a significant difference would we disregard
> > > the diff of the hw counter and instead use the diff based on the
> > > timestamps. Not sure what "significant" is though; One frame, two
> > > frames?
> >
> > Hm, another idea: The only point where we can trust the entire hw counter
> > + timestamp sampling is when the irq happens. Because then we know the
> > driver will have properly corrected for any hw oddities (like hw counter
> > flipping not at top-of-frame, like the core expects).
>
> i915 at least gives out correct data regardless of when you sample
> it. Well, except for the cases where the hw counter gets trashed,
> in which case the hw counter is garbage (when compared with .last)
> but the timestamp is still correct.
Hm where/how do we handle this? Maybe I'm just out of date with how it
all works nowadays.
> > So what if _restore always goes back to the last such trusted hw counter
> > for computing the frame counter diff and all that stuff? That way if we
> > have a bunch of _restore with incosisten hw vblank counter, we will a)
> > only take the last one (fixes the bug you're trying to fix) b) still use
> > the same last trusted baseline for computations (addresses the race I'm
> > seeing).
> >
> > Or does this not work?
>
> I don't think I really understand what you're suggesting here.
> _restore is already using the last trusted data (the stored
> timestamp + .last).
>
> So the one thing _restore will have to update is .last.
> I think it can either do what it does now and set .last
> to the current hw counter value + update the timestamp
> to match, or it could perhaps adjust the stored .last
> such that the already stored timestamp and the updated
> .last match up. But I think both of those options have
> the same level or inaccuracy since both would still do
> the same ts_diff->hw_counter_diff prediction.
>
> >
> > It does complicate the code a bit, because we'd need to store the
> > count/timestamp information from _restore outside of the usual vblank ts
> > array. But I think that addresses everything.
>
> Hmm. So restore would store this extra information
> somewhere else, and not update the normal stuff at all?
> What exactly would we do with that extra data?
Hm I guess I didn't think this through. But the idea I had was:
- _restore always recomputes back from the las
drm_crtc_handl_vblank-stored timestamp.
- the first drm_crtc_handle_vblank bakes in any corrections that
_restore has prepared meanwhile
- same applies to all the sampling functions we might look at lastes
timestamps/counter values.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list