[PATCH weston v2 10/11] [RFC] Account for very large repaint window misses

Wed Mar 29 10:30:49 UTC 2017

On Tue, 28 Mar 2017 20:53:53 +0200
Mario Kleiner <mario.kleiner.de at gmail.com> wrote:

> On 03/28/2017 01:02 PM, Pekka Paalanen wrote:
> > Hi Mario,
> >
> > I'm glad to hear from you, it's this kind of details we really
> > appreciate you for. :-)
> >
> > On Tue, 28 Mar 2017 00:59:41 +0200
> > Mario Kleiner <mario.kleiner.de at gmail.com> wrote:
> >  
> >> Hi Daniel & Pekka,
> >>

> > The question was: when would the postponing adjustment be more than one
> > refresh period? Why need the loop, why is not a one-shot adjustment
> > enough?
> >
> > start_repaint_loop() is specifically written to check that the
> > timestamp from the immediate vblank query is not several periods in the
> > past. So the only time I see adjustment loop spinning more than once is
> > if we fell back to a flip in start_repaint_loop() and for some reason
> > are extremely late in processing the resulting page flip event, or the
> > page flip event carries a bad timestamp.
> >  
> 
> Agreed. The adjustment can never be more than 1 refresh period, ergo 
> doesn't need a loop, unless we'd get the very unlikely case of Weston 
> getting preempted for a long time exactly during the few instructions 
> leading from output->start_repaint_loop to weston_output_finish_frame 
> reading the compositor clock. In which case we'd glitch badly in any 
> case. I can't think of a way, apart from kms driver bug, that the 
> pageflip event could carry a bad timestamp, and for the pageflip driven 
> path we don't hit that one-shot adjustment (or potential loop) anyway, 
> as the flags aren't WP_PRESENTATION_FEEDBACK_INVALID.

Oh, that is true about the pageflip fallback path of
start_repaint_loop. Makes me wonder if I should "fix" that, since
start_repaint_loop should never result in sending out presentation
feedback events, and can't assert on that if the flags are not
FEEDBACK_INVALID. Oh well, I've been meaning to get rid of the INVALID
flag by introducing a different function to call in that case.

> > Unfortunately, I also conflated another topic in the same discussion:
> > if the delay sanity check fails, should we postpone also then.  
> 
> I guess if it failed due to msecs_rel < -1000 we should treat it as a 
> repaint loop restart, setting
> 
> if (msecs_rel < -1000)
>          presented_flags = WP_PRESENTATION_FEEDBACK_INVALID;
> 
> I can't think of a way msecs_rel > 1000 unless refresh_nsecs would be 
> literally >= 1 second, in which case next_repaint = now would be the 
> only safe choice, ie.,
> 
> if (msecs_rel < -1000)
>          presented_flags = WP_PRESENTATION_FEEDBACK_INVALID;
> else
>          output->next_repaint = now;

I should keep that in mind, thanks.

> >  
> >>> Sure!
> >>>  
> >>>>> Did you see a case where it happened?  
> >>>>
> >>>> No, it's just something that had been bugging me ever since I started
> >>>> looking into repaint. Found purely by inspection.
> >>>>  
> >>>>> I think it would only happen with a) broken driver reporting bad
> >>>>> timestamps with pageflip events (we already verify timestamps from the
> >>>>> vblank query and fall back to pageflip otherwise), or b) something
> >>>>> stalled weston for too long.
> >>>>>
> >>>>> a) would be a bug we want to log. b) might be a bug we want to log, or
> >>>>> just high system load in general which we wouldn't want to log, really.
> >>>>> Unfortunately there is no way to tell.
> >>>>>
> >>>>> There is the original reason we want to ensure the target time is in
> >>>>> the future, but there are no reasons when doing so would be bad.
> >>>>> Therefore I think this is a good idea.
> >>>>>  
> >>
> >> I'm not sure if generally ensuring that the target time is in the future
> >> helps or hurts or doesn't matter in practice. That code wasn't meant to
> >> deal with overload/scheduling delay. I assume you can only get larger
> >> delays during repaint loop restart or during already running repaint
> >> loop on system overload, and in that case all predictability is probably
> >> out of the window and the whole desktop will glitch anyway. One could
> >> argue that it would be better to just accept defeat and try to get an
> >> output repaint out as fast as possible, so not too many client
> >> schedule_repaint requests can back up during the extra time given to
> >> them by shifting the repaint deadline further into the future, possibly
> >> making the glitch longer. Not sure if that would matter in practice though.  
> >
> > That, indeed, I agree with. It should not matter, things have already
> > fallen apart if the while-loop would spin more than once.
> >
> > That still leaves the question: should this patch be merged? :-)  
> 
> I tend to yes for the merge. I think it can't hurt in the "restart 
> repaint loop" case. It won't help avoid the big glitch which already 
> happened, but it gets us back to well defined behavior after the glitch, 
> and it may help some debugging at some time if we know what next_repaint 
> to expect wrt. display timing, especially if one would need to compare 
> Weston debug output with drm/kms timestamping debug output. In that 
> sense the while loop is easy to understand and even if it would 
> inefficiently spin many times it would be likely nothing compared to the 
> time lost during the big glitch/preemption.

Very good!

Thanks,
pq
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/wayland-devel/attachments/20170329/2e7b72c5/attachment.sig>