[Nouveau] [PATCH 3/3] dri2: Fixes to swap scheduling, especially for copy-swaps.
Mario Kleiner
mario.kleiner at tuebingen.mpg.de
Thu Sep 8 11:51:30 PDT 2011
On Sep 8, 2011, at 1:00 AM, Francisco Jerez wrote:
Thanks for your review. See comments below.
> Mario Kleiner <mario.kleiner at tuebingen.mpg.de> writes:
>
>> Treats vblank event scheduling for the non-pageflip swap
>> case correctly. Allows vblank controlled swaps for redirected
>> windows. Fixes some corner-cases in OML_sync_control scheduling
>> when divisor and remainder parameters are used.
>>
>> Signed-off-by: Mario Kleiner <mario.kleiner at tuebingen.mpg.de>
>> ---
>> src/nouveau_dri2.c | 71 ++++++++++++++++++++++++++++++++++++++++
>> ++++-------
>> 1 files changed, 61 insertions(+), 10 deletions(-)
>>
>> diff --git a/src/nouveau_dri2.c b/src/nouveau_dri2.c
>> index 9f0ee97..f14dea4 100644
>> --- a/src/nouveau_dri2.c
>> +++ b/src/nouveau_dri2.c
>> @@ -196,10 +196,8 @@ can_sync_to_vblank(DrawablePtr draw)
>> {
>> ScrnInfoPtr scrn = xf86Screens[draw->pScreen->myNum];
>> NVPtr pNv = NVPTR(scrn);
>> - PixmapPtr pix = NVGetDrawablePixmap(draw);
>>
>> return pNv->glx_vblank &&
>> - nouveau_exa_pixmap_is_onscreen(pix) &&
>
> I'm not sure that this is the right way to fix this problem, depending
> on the kind of transformations that the compositing manager is doing
> you're likely to end up picking a CRTC at random... I just have the
> impression that for redirected windows syncing to vblank is no longer
> our responsibility, but rather the compositing manager's.
>
I don't think it is the perfect way either, but to me it seems to be
some improvement for the time being. Ideally there would be protocol
probably between the compositor and mesa to get this perfectly right
- translate the oml_sync_control and classic glXSwapBuffers requests
into some requests for the compositor and x-server.
But the current implementation under a compositor is not great. You
get glxgears reporting that "vsync is on and the redraw rate should
equal the refresh rate" but see 2900 fps reported on a 60 Hz display,
with apparently 60 Hz animation. And, in my use case, toolkits that
care about timing and do some consistency checks on their swapbuffers
execution bail out immediately, telling you to fix your "totally
broken graphics driver setup".
It's a pure "better than nothing" change for the redirected case,
which seems to behave less confusing, at least as no fancy
transformations are used, e.g., during desktop transitions.
Some consistent crtc selection is another thing that imho would need
a bit of work for nouveau and the other drivers. Currently nouveau
always selects the 1st crtc for vblank events based swap scheduling
and vsync if a window spans multiple displays on xinerama setups,
regardless if a non-fullscreen window is displaying on the first or
second monitor. Same goes for fullscreen drawables, regardless of how
much area is covered by the viewport of a crtc. The other drivers
select the crtc to sync by the intersection area of drawable and
crtc, so tearing fuer multi-display spanning windows is limited to
the "smaller display". It also makes sense to give users some control
over this for clone mode, e.g., when they give a presentation and
don't care if their laptops flat panel tears, but don't want tearing
on the presentation screen for the audience. E.g., one could prefer
the --primary output as defined by xrandr.
>> nv_window_belongs_to_crtc(scrn, draw->x, draw->y,
>> draw->width, draw->height);
>> }
>> @@ -344,9 +342,40 @@ nouveau_dri2_schedule_swap(ClientPtr client,
>> DrawablePtr draw,
>> CARD64 *target_msc, CARD64 divisor, CARD64 remainder,
>> DRI2SwapEventPtr func, void *data)
>> {
>> + PixmapPtr dst_pix;
>> + PixmapPtr src_pix = nouveau_dri2_buffer(src)->ppix;
>> struct nouveau_dri2_vblank_state *s;
>> CARD64 current_msc, expect_msc;
>> - int ret;
>> + int ret, flip = 0;
>> + Bool front_updated;
>> +
>> + /* Truncate to match kernel interfaces; means occasional overflow
>> + * misses, but that's generally not a big deal.
>> + */
>> + *target_msc &= 0xffffffff;
>> + divisor &= 0xffffffff;
>> + remainder &= 0xffffffff;
>> +
> This doesn't really fix the problem with our 32bit counters wrapping
> around, but, as the comment says, it's not a big deal...
>
Yes. Another item, get 64-bit counters and api for the kernel drm.
Mostly to make explicit that there is a 32-bit limit in place.
>> + /* Update frontbuffer pixmap and name: Could have changed due to
>> + * window (un)redirection as part of compositing.
>> + */
>> + front_updated = update_front(draw, dst);
>> +
>> + /* Assign frontbuffer pixmap, after update in update_front() */
>> + dst_pix = nouveau_dri2_buffer(dst)->ppix;
>> +
>> + /* Flips need to be submitted one frame before */
>> + if (DRI2CanFlip(draw) && front_updated &&
>> + can_exchange(draw, dst_pix, src_pix)) {
>> + flip = 1;
>> + }
>> +
>> + /* Correct target_msc by 'flip' if this is a page-flipped swap.
>> + * Do it early, so handling of different timing constraints
>> + * for divisor, remainder and msc vs. target_msc works.
>> + */
>> + if (*target_msc > 0)
>> + *target_msc -= (CARD64) flip;
>
> We don't need the code above, blits are submitted one frame before as
> well - the wait for the final vblank is done in hardware by the GPU
> itself, that way we don't have to worry about tearing caused by the
> GPU
> being too busy to finish all its previous work in time for the
> specified
> vblank interval.
>
Oh ok, thanks. I have special measurement equipment here to test
pageflipped swaps for timing, but can't test the copyswap case
easily. My toolkit was complaining loudly about inconsistencies, so i
was just assuming it is due to the same logic as on other gpu's +
missing code in the ddx. If copy-swaps are intentionally scheduled
one frame ahead, then DRI2SwapComplete timestamps would need to be
corrected for that. Currently you get the puzzling result from the
timestamps that swaps complete before they were even scheduled by the
client, typically a clear indication of broken vsync support in the
driver.
Do you know how this is done at the hardware level? Exactly as with
pageflips? The gpu programming seems to be the same in the ddx. Is
the blip operation started at leading edge of the vblank interval? Or
anywhere inside the vblank interval (level triggered)? Are such blits
submitted on a separate fifo (or even dma engine?) in the gpu to
avoid stalling the rest of the command stream until vblank?
>>
>> /* Initialize a swap structure */
>> s = malloc(sizeof(*s));
>> @@ -363,19 +392,34 @@ nouveau_dri2_schedule_swap(ClientPtr client,
>> DrawablePtr draw,
>> if (ret)
>> goto fail;
>>
>> - /* Calculate a swap target if we don't have one */
>> - if (current_msc >= *target_msc && divisor)
>> + /* Calculate a swap target if we don't have one or if
>> + * divisor/remainder relationship must be satisfied.
>> + */
> Hm, the divisor/remainder relationship having to be satisfied is just
> what one has to do when "we don't have one".
Technically, if the target_msc is already reached/exceeded and we
have divisor/remainder. "we don't have one" == "target_msc == 0" is a
very common special case of this. Just wanted to clarify, as i
understood "we don't have one" as target_msc == 0. Not sure if my new
comment is much better, just that i was a bit confused on first read.
>> + if (current_msc >= *target_msc && divisor) {
>> *target_msc = current_msc + divisor
>> - (current_msc - remainder) % divisor;
>>
>> - /* Request a vblank event one frame before the target */
>> + /* Account for extra pageflip delay if flip > 0 */
>> + *target_msc -= (CARD64) flip;
>> + }
>> +
>> + /* Request a vblank event one frame before the target, unless
>> + * this is a copy-swap, in which case we need to make sure
>> + * it is only dispatched at the target frame or later.
>> + */
>> ret = nouveau_wait_vblank(draw, DRM_VBLANK_ABSOLUTE |
>> - DRM_VBLANK_EVENT,
>> - max(current_msc, *target_msc - 1),
>> + DRM_VBLANK_EVENT |
>> + ((flip) ? 0 : DRM_VBLANK_NEXTONMISS),
>> + max(current_msc, *target_msc),
>> &expect_msc, NULL, s);
>> if (ret)
>> goto fail;
>> - s->frame = (unsigned int) expect_msc & 0xffffffff;
>> +
>> + /* Store expected target_msc for later consistency check and
>> + * return it to server for proper swap_interval implementation.
>> + */
>> + s->frame = ((unsigned int) (expect_msc & 0xffffffff)) + flip ;
>> + *target_msc = s->frame;
>
> No need for this, same comment as above.
>
Ok.
>> } else {
>> /* We can't/don't want to sync to vblank, just swap. */
>> nouveau_dri2_finish_swap(draw, 0, 0, 0, s);
>> @@ -396,6 +440,13 @@ nouveau_dri2_schedule_wait(ClientPtr client,
>> DrawablePtr draw,
>> CARD64 current_msc;
>> int ret;
>>
>> + /* Truncate to match kernel interfaces; means occasional overflow
>> + * misses, but that's generally not a big deal.
>> + */
>> + target_msc &= 0xffffffff;
>> + divisor &= 0xffffffff;
>> + remainder &= 0xffffffff;
>> +
>> if (!can_sync_to_vblank(draw)) {
>> DRI2WaitMSCComplete(client, draw, target_msc, 0, 0);
>> return TRUE;
>> @@ -415,7 +466,7 @@ nouveau_dri2_schedule_wait(ClientPtr client,
>> DrawablePtr draw,
>> goto fail;
>>
>> /* Calculate a wait target if we don't have one */
>> - if (current_msc > target_msc && divisor)
>> + if (current_msc >= target_msc && divisor)
>> target_msc = current_msc + divisor
>> - (current_msc - remainder) % divisor;
*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany
e-mail: mario.kleiner at tuebingen.mpg.de
office: +49 (0)7071/601-1623
fax: +49 (0)7071/601-616
www: http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)
More information about the Nouveau
mailing list