[PATCH 2/6] compositor-drm: Allow instant start of repaint loop.

Tue Apr 7 00:34:35 PDT 2015

On Sat, 04 Apr 2015 19:45:10 +0200
Mario Kleiner <mario.kleiner.de at gmail.com> wrote:

> On 04/02/2015 01:37 PM, Pekka Paalanen wrote:
> > On Thu,  2 Apr 2015 07:10:50 +0200
> > Mario Kleiner <mario.kleiner.de at gmail.com> wrote:
> >
> >> drm_output_start_repaint_loop() incurred a delay of
> >> one refresh cycle by using a no-op page-flip to get
> >> an accurate vblank timestamp as reference. This causes
> >> unwanted lag whenever Weston exited its repaint loop, e.g.,
> >> whenever an application wants to repaint with less than
> >> full video refresh rate but still minimum lag.
> >>
> >> Try to use the drmWaitVblank ioctl to get a proper
> >> timestamp instantaneously without lag. If that does
> >> not work, fall back to the old method of idle page-flip.
> >>
> >> This optimization should work on any drm/kms driver
> >> which supports high precision vblank timestamping.
> >> As of Linux 4.0 these would be intel, radeon and
> >> nouveau on all supported gpu's.
> >>
> >> Signed-off-by: Mario Kleiner <mario.kleiner.de at gmail.com>
> >> ---
> >>   src/compositor-drm.c | 41 ++++++++++++++++++++++++++++++++++++++++-
> >>   1 file changed, 40 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/src/compositor-drm.c b/src/compositor-drm.c
> >> index fe59811..4a7baa1 100644
> >> --- a/src/compositor-drm.c
> >> +++ b/src/compositor-drm.c
> >> @@ -225,6 +225,9 @@ static const char default_seat[] = "seat0";
> >>   static void
> >>   drm_output_set_cursor(struct drm_output *output);
> >>
> >> +static void
> >> +drm_output_update_msc(struct drm_output *output, unsigned int seq);
> >> +
> >>   static int
> >>   drm_sprite_crtc_supported(struct drm_output *output, uint32_t supported)
> >>   {
> >> @@ -704,6 +707,12 @@ err_pageflip:
> >>   	return -1;
> >>   }
> >>
> >> +static int64_t
> >> +timespec_to_nsec(const struct timespec *a)
> >> +{
> >> +	return (int64_t)a->tv_sec * 1000000000000LL + a->tv_nsec;
> >
> > Hi,
> >
> > are you sure this cannot overflow? I think tv_sec could be a int64_t.
> >
> > That no-one gets the idea of initializing the clock in the kernel to
> > some huge value just to fish out this kind of overflows?
> >
> 
> I almost literally copied it from compositor.c, your repaint delay 
> handling. I think it can't overflow the way it is used here to only 
> remap from kernel timestamps. The kernel delivers its timestamps in 
> "struct timeval", so tv_sec (== long int) from kernel could be a 64-bit 
> int on 64-Bit kernels, but the values put into it are always derived 
> from ktime_t which is "nanoseconds stored in 64 bit signed integers". So 
> at least with the current kernel implementation it can't overflow. And 
> 32-Bit kernels would hit such overflows before user space as their 
> struct timeval seems to have 32 bit int for tv_sec. So i think with the 
> current kernel ioctl interfaces it should be safe.

Hi,

that's nice to know about the kernel. Isn't tv_sec actually a time_t? I
once or twice tried to find out what was guaranteed about time_t and I
couldn't really find much, except that it may be even 64-bit or it
maybe be less.

It's hard for me to draw the line between trusting only specs and
trusting the implementation I have at hand.

Btw. I always try to do a timespec_to_nsec() on a timespec that already
is a difference, never an absolute time. Below you are subtracting
nsecs instead of timespecs, so that got me wondering.

> 
> >> +}
> >> +
> >>   static void
> >>   drm_output_start_repaint_loop(struct weston_output *output_base)
> >>   {
> >> @@ -711,7 +720,13 @@ drm_output_start_repaint_loop(struct weston_output *output_base)
> >>   	struct drm_compositor *compositor = (struct drm_compositor *)
> >>   		output_base->compositor;
> >>   	uint32_t fb_id;
> >> -	struct timespec ts;
> >> +	struct timespec ts, tnow;
> >> +	int ret;
> >> +	drmVBlank vbl = {
> >> +		.request.type = DRM_VBLANK_RELATIVE,
> >> +		.request.sequence = 0,
> >> +		.request.signal = 0,
> >> +	};
> >>
> >>   	if (output->destroy_pending)
> >>   		return;
> >> @@ -721,6 +736,30 @@ drm_output_start_repaint_loop(struct weston_output *output_base)
> >>   		goto finish_frame;
> >>   	}
> >>
> >> +	/* Try to get current msc and timestamp via instant query */
> >> +	vbl.request.type |= drm_waitvblank_pipe(output);
> >> +	ret = drmWaitVBlank(compositor->drm.fd, &vbl);

Just to be sure I understood right: this call attempts to get the exact
time the latest vblank ended?

> >> +
> >> +	/* Error return or zero timestamp means failure to get valid timestamp */
> >> +	if ((ret == 0) && (vbl.reply.tval_sec > 0)) {
> >
> > No need to check tval_usec?
> 
> I don't think it is necessary for this quick crude early-out check and 
> would complicate the check to be almost as expensive as the check it is 
> trying to skip.

I don't think checking one more value is going to be noticeable...

> On kernels <= Linux 3.16 a zero time value tval_sec=tval_usec=0 meant 
> "invalid timestamp, please retry again later". This would happen on kms 
> drivers which don't support high precision instant vblank timestamping 
> if vblank interrupts were disabled (currently all but intel, radeon and 
> nouveau, so essentially all SoC's), because the only way to get a valid 
> timestamp there is to wait for a vblank irq and collect the timestamp 
> there, and we didn't want the "non-blocking" bits of drmWaitVblank ioctl 
> to block for up to a whole video refresh cycle.
> 
> The tval_sec > 0 check detects these invalid timestamps and immediately 
> skips to the pageflip fallback below. It would also skip to fallback if 
> weston would somehow manage to run within the first second of kernel 
> boot, but that's very unlikely and all it would cause is 1 extra frame 
> of lag during that second.

Makes perfect sense, yeah, the corner case being essentially irrelevant
indeed.

> The reason we need the check for stale timestamps at all is because we 
> accidentally removed that "timestamp 0 == invalid ts" signalling during 
> some improvements to the vblank handling in kernel 3.17 which were 
> needed for the atomic modesetting stuff. At the moment timestamps will 
> be always correct and instantaneous on intel/radeon/nouveau-kms, but the 
> kernel would deliver stale timestamps for <= 1 video refresh duration 
> after vblank irqs were turned off and on again on the various SoC 
> display drivers.

Oh, I didn't know that. Interesting.

> I'll try to prepare a kernel drm patch to fix that signalling, but now 
> we have to deal with at least some broken kernels anyway.

Agreed, need to deal with them for a long time, I believe.

> I think it would be a good idea to find ways to implement the 
> instantaneous high precision vblank timestamping also on more kms 
> drivers for SoC's, as it would be not only good for timing precision, 
> but also for lag reduction and power saving. For any SoC that allows to 
> query the current crtc scanout position via some register this could be 
> easily done using the same shared drm helper routines we already use for 
> intel/radeon/nouveau. Or if they had some hardware vblank timestamp 
> register that could be mapped to CLOCK_MONOTONIC time.
> 
> >> +		ts.tv_sec = vbl.reply.tval_sec;
> >> +		ts.tv_nsec = vbl.reply.tval_usec * 1000;
> >> +
> >> +		/* Valid timestamp for most recent vblank - not stale? Stale ts could
> >> +		 * happen on Linux 3.17+, so make sure it is not older than 1 refresh
> >> +		 * duration since now.
> >> +		 */
> >> +		weston_compositor_read_presentation_clock(&compositor->base, &tnow);
> >> +		if (timespec_to_nsec(&tnow) - timespec_to_nsec(&ts) <
> >
> > I'd write this difference in terms of timespec first and convert to
> > nsec afterwards, but I can do that also as a follow-up when I happen to
> > visit this part the next time. I have quite some timespec helpers lying
> > around that I should collect into one place at some point.
> >
> 
> If you wanted to introduce some proper timespec helpers anyway then i'd 
> leave it to you when you get around doing that. I went for the easy 
> route here because i didn't want to duplicate too much helper code from 
> other places like compositor.c or open-code this stuff just for one 
> single check.

Cool, sure.

> >> +			(1000000000000LL / output_base->current_mode->refresh)) {
> >> +			drm_output_update_msc(output, vbl.reply.sequence);
> >> +			weston_output_finish_frame(output_base, &ts,
> >> +									   PRESENTATION_FEEDBACK_INVALID);

(*)

Hmm... could this cause weston_output_finish_frame() to be called twice
with the same timestamp? Up to now I have considered that to be a bug.

I think it could, so we need to check that doesn't cause problems
elsewhere. I don't immediately recall any cases it would be a problem,
except if we did sanity checking on the timestamps explicitly.

> >> +			return;
> >> +		}
> >> +	}
> >> +
> >> +	/* Immediate query didn't provide valid timestamp. Use pageflip fallback */
> >>   	fb_id = output->current->fb_id;
> >>
> >>   	if (drmModePageFlip(compositor->drm.fd, output->crtc_id, fb_id,
> >
> > At least
> > Acked-by: Pekka Paalanen <pekka.paalanen at collabora.co.uk>
> > for now.
> >

I'm happy with the explanations though I didn't check the DRM API usage
carefully.

Would be nice to record those explanations at least in the commit
message too. If you're going to respin this patch, you could take care
of that whitespace issue also(*).

Thanks,
pq