[Nouveau] [PATCH 3/3] dri2: Fixes to swap scheduling, especially for copy-swaps.

Thu Sep 8 11:51:30 PDT 2011

On Sep 8, 2011, at 1:00 AM, Francisco Jerez wrote:

Thanks for your review. See comments below.

> Mario Kleiner <mario.kleiner at tuebingen.mpg.de> writes:
>
>> Treats vblank event scheduling for the non-pageflip swap
>> case correctly. Allows vblank controlled swaps for redirected
>> windows. Fixes some corner-cases in OML_sync_control scheduling
>> when divisor and remainder parameters are used.
>>
>> Signed-off-by: Mario Kleiner <mario.kleiner at tuebingen.mpg.de>
>> ---
>>  src/nouveau_dri2.c |   71 ++++++++++++++++++++++++++++++++++++++++ 
>> ++++-------
>>  1 files changed, 61 insertions(+), 10 deletions(-)
>>
>> diff --git a/src/nouveau_dri2.c b/src/nouveau_dri2.c
>> index 9f0ee97..f14dea4 100644
>> --- a/src/nouveau_dri2.c
>> +++ b/src/nouveau_dri2.c
>> @@ -196,10 +196,8 @@ can_sync_to_vblank(DrawablePtr draw)
>>  {
>>  	ScrnInfoPtr scrn = xf86Screens[draw->pScreen->myNum];
>>  	NVPtr pNv = NVPTR(scrn);
>> -	PixmapPtr pix = NVGetDrawablePixmap(draw);
>>
>>  	return pNv->glx_vblank &&
>> -		nouveau_exa_pixmap_is_onscreen(pix) &&
>
> I'm not sure that this is the right way to fix this problem, depending
> on the kind of transformations that the compositing manager is doing
> you're likely to end up picking a CRTC at random... I just have the
> impression that for redirected windows syncing to vblank is no longer
> our responsibility, but rather the compositing manager's.
>

I don't think it is the perfect way either, but to me it seems to be  
some improvement for the time being. Ideally there would be protocol  
probably between the compositor and mesa to get this perfectly right  
- translate the oml_sync_control and classic glXSwapBuffers requests  
into some requests for the compositor and x-server.

But the current implementation under a compositor is not great. You  
get glxgears reporting that "vsync is on and the redraw rate should  
equal the refresh rate" but see 2900 fps reported on a 60 Hz display,  
with apparently 60 Hz animation. And, in my use case, toolkits that  
care about timing and do some consistency checks on their swapbuffers  
execution bail out immediately, telling you to fix your "totally  
broken graphics driver setup".

It's a pure "better than nothing" change for the redirected case,  
which seems to behave less confusing, at least as no fancy  
transformations are used, e.g., during desktop transitions.

Some consistent crtc selection is another thing that imho would need  
a bit of work for nouveau and the other drivers. Currently nouveau  
always selects the 1st crtc for vblank events based swap scheduling  
and vsync if a window spans multiple displays on xinerama setups,  
regardless if a non-fullscreen window is displaying on the first or  
second monitor. Same goes for fullscreen drawables, regardless of how  
much area is covered by the viewport of a crtc. The other drivers  
select the crtc to sync by the intersection area of drawable and  
crtc, so tearing fuer multi-display spanning windows is limited to  
the "smaller display". It also makes sense to give users some control  
over this for clone mode, e.g., when they give a presentation and  
don't care if their laptops flat panel tears, but don't want tearing  
on the presentation screen for the audience. E.g., one could prefer  
the --primary output as defined by xrandr.

>>  		nv_window_belongs_to_crtc(scrn, draw->x, draw->y,
>>  					  draw->width, draw->height);
>>  }
>> @@ -344,9 +342,40 @@ nouveau_dri2_schedule_swap(ClientPtr client,  
>> DrawablePtr draw,
>>  			   CARD64 *target_msc, CARD64 divisor, CARD64 remainder,
>>  			   DRI2SwapEventPtr func, void *data)
>>  {
>> +	PixmapPtr dst_pix;
>> +	PixmapPtr src_pix = nouveau_dri2_buffer(src)->ppix;
>>  	struct nouveau_dri2_vblank_state *s;
>>  	CARD64 current_msc, expect_msc;
>> -	int ret;
>> +	int ret, flip = 0;
>> +	Bool front_updated;
>> +
>> +	/* Truncate to match kernel interfaces; means occasional overflow
>> +	 * misses, but that's generally not a big deal.
>> +	 */
>> +	*target_msc &= 0xffffffff;
>> +	divisor &= 0xffffffff;
>> +	remainder &= 0xffffffff;
>> +
> This doesn't really fix the problem with our 32bit counters wrapping
> around, but, as the comment says, it's not a big deal...
>

Yes. Another item, get 64-bit counters and api for the kernel drm.  
Mostly to make explicit that there is a 32-bit limit in place.

>> +	/* Update frontbuffer pixmap and name: Could have changed due to
>> +	 * window (un)redirection as part of compositing.
>> +	 */
>> +	front_updated = update_front(draw, dst);
>> +
>> +	/* Assign frontbuffer pixmap, after update in update_front() */
>> +	dst_pix = nouveau_dri2_buffer(dst)->ppix;
>> +
>> +	/* Flips need to be submitted one frame before */
>> +	if (DRI2CanFlip(draw) && front_updated &&
>> +	    can_exchange(draw, dst_pix, src_pix)) {
>> +		flip = 1;
>> +	}
>> +
>> +	/* Correct target_msc by 'flip' if this is a page-flipped swap.
>> +	 * Do it early, so handling of different timing constraints
>> +	 * for divisor, remainder and msc vs. target_msc works.
>> +	 */
>> +	if (*target_msc > 0)
>> +		*target_msc -= (CARD64) flip;
>
> We don't need the code above, blits are submitted one frame before as
> well - the wait for the final vblank is done in hardware by the GPU
> itself, that way we don't have to worry about tearing caused by the  
> GPU
> being too busy to finish all its previous work in time for the  
> specified
> vblank interval.
>

Oh ok, thanks. I have special measurement equipment here to test  
pageflipped swaps for timing, but can't test the copyswap case  
easily. My toolkit was complaining loudly about inconsistencies, so i  
was just assuming it is due to the same logic as on other gpu's +  
missing code in the ddx. If copy-swaps are intentionally scheduled  
one frame ahead, then DRI2SwapComplete timestamps would need to be  
corrected for that. Currently you get the puzzling result from the  
timestamps that swaps complete before they were even scheduled by the  
client, typically a clear indication of broken vsync support in the  
driver.

Do you know how this is done at the hardware level? Exactly as with  
pageflips? The gpu programming seems to be the same in the ddx. Is  
the blip operation started at leading edge of the vblank interval? Or  
anywhere inside the vblank interval (level triggered)? Are such blits  
submitted on a separate fifo (or even dma engine?) in the gpu to  
avoid stalling the rest of the command stream until vblank?

>>
>>  	/* Initialize a swap structure */
>>  	s = malloc(sizeof(*s));
>> @@ -363,19 +392,34 @@ nouveau_dri2_schedule_swap(ClientPtr client,  
>> DrawablePtr draw,
>>  		if (ret)
>>  			goto fail;
>>
>> -		/* Calculate a swap target if we don't have one */
>> -		if (current_msc >= *target_msc && divisor)
>> +		/* Calculate a swap target if we don't have one or if
>> +		 * divisor/remainder relationship must be satisfied.
>> +		 */
> Hm, the divisor/remainder relationship having to be satisfied is just
> what one has to do when "we don't have one".

Technically, if the target_msc is already reached/exceeded and we  
have divisor/remainder. "we don't have one" == "target_msc == 0" is a  
very common special case of this. Just wanted to clarify, as i  
understood "we don't have one" as target_msc == 0. Not sure if my new  
comment is much better, just that i was a bit confused on first read.

>> +		if (current_msc >= *target_msc && divisor) {
>>  			*target_msc = current_msc + divisor
>>  				- (current_msc - remainder) % divisor;
>>
>> -		/* Request a vblank event one frame before the target */
>> +			/* Account for extra pageflip delay if flip > 0 */
>> +			*target_msc -= (CARD64) flip;
>> +		}
>> +
>> +		/* Request a vblank event one frame before the target, unless
>> +		 * this is a copy-swap, in which case we need to make sure
>> +		 * it is only dispatched at the target frame or later.
>> +		 */
>>  		ret = nouveau_wait_vblank(draw, DRM_VBLANK_ABSOLUTE |
>> -					  DRM_VBLANK_EVENT,
>> -					  max(current_msc, *target_msc - 1),
>> +					  DRM_VBLANK_EVENT |
>> +					  ((flip) ? 0 : DRM_VBLANK_NEXTONMISS),
>> +					  max(current_msc, *target_msc),
>>  					  &expect_msc, NULL, s);
>>  		if (ret)
>>  			goto fail;
>> -		s->frame = (unsigned int) expect_msc & 0xffffffff;
>> +
>> +		/* Store expected target_msc for later consistency check and
>> +		 * return it to server for proper swap_interval implementation.
>> +		 */
>> +		s->frame = ((unsigned int) (expect_msc & 0xffffffff)) + flip ;
>> +		*target_msc = s->frame;
>
> No need for this, same comment as above.
>

Ok.

>>  	} else {
>>  		/* We can't/don't want to sync to vblank, just swap. */
>>  		nouveau_dri2_finish_swap(draw, 0, 0, 0, s);
>> @@ -396,6 +440,13 @@ nouveau_dri2_schedule_wait(ClientPtr client,  
>> DrawablePtr draw,
>>  	CARD64 current_msc;
>>  	int ret;
>>
>> +	/* Truncate to match kernel interfaces; means occasional overflow
>> +	 * misses, but that's generally not a big deal.
>> +	 */
>> +	target_msc &= 0xffffffff;
>> +	divisor &= 0xffffffff;
>> +	remainder &= 0xffffffff;
>> +
>>  	if (!can_sync_to_vblank(draw)) {
>>  		DRI2WaitMSCComplete(client, draw, target_msc, 0, 0);
>>  		return TRUE;
>> @@ -415,7 +466,7 @@ nouveau_dri2_schedule_wait(ClientPtr client,  
>> DrawablePtr draw,
>>  		goto fail;
>>
>>  	/* Calculate a wait target if we don't have one */
>> -	if (current_msc > target_msc && divisor)
>> +	if (current_msc >= target_msc && divisor)
>>  		target_msc = current_msc + divisor
>>  			- (current_msc - remainder) % divisor;

*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.kleiner at tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)