[Bug 38800] glXSwapBuffersMscOML is slow on AMD Fusion but not on Intel 945 w/Atom

Wed Jul 6 13:54:17 PDT 2011

https://bugs.freedesktop.org/show_bug.cgi?id=38800

--- Comment #21 from Mario Kleiner <mario.kleiner at tuebingen.mpg.de> 2011-07-06 13:54:14 PDT ---
@Simon: Michel and your testcode is right, the pageflip is only programmed
about 2 scanlines after the end of vblank, so the crtc waits for another full
refresh cycle before flipping at the start of the next vblank.

Looking at the drm.debug=0xf log, there is a huge delay between entering the
radeon irq handler and reaching the vblank handling code:

[68774.608689] [drm:evergreen_irq_process], r600_irq_process start: rptr 8528,
wptr 8544
[68774.609052] [drm:drm_calc_vbltimestamp_from_scanoutpos], crtc 0 : v 5
p(1154,2)@ 1309862042.323318 -> 1309862042.323284 [e 3 us, 0 rep]
[68774.609069] [drm:evergreen_page_flip], Update pending now high. Unlocking
vupdate_lock.
[68774.609077] [drm:evergreen_irq_process], IH: D1 vblank

-> 68774.609052 - 68774.608689 -> 363 microseconds! That's much larger than
anything i've ever seen during testing of that code path even on rather ancient
hardware. The whole vblank interrupt handler takes almost 400 usecs to execute.

It probably explains why the flip is scheduled so late and misses the deadline
with reduced blanking. I think it would be good to find out where so much time
is spent. If you set the drm.timestamp_precision_usec module parameter to zero,
it will skip the high precision timestamping and do a do_gettimeofday() call
instead, just to see how much time is spent there.

@Michel:

We wait with programming the pageflip until vblank irq, because it was the most
simple way to get scheduling, pageflip completion and timestamping for pageflip
events reliably done. We thought through various other methods, but each turned
out to have some races:

Alex told me that older Radeon's (R100-R500) don't have pageflip completion
irq's, so we couldn't use those to detect and timestamp when the pageflip was
really done. We first thought about scheduling the pageflip via a packet in the
command stream and only checking for pageflip completion in the vblank irq
handler. But that has a couple of funny race conditions, especially if the
pageflip is scheduled close to or inside the target vblank interval, which can
make the pageflip completion and timestamping unreliable and can also lead to
large latency for scheduling a flip with the dri2 vblank event method of
scheduling flips (glXSwapBuffersMscOML() etc.) if multiple clients are
rendering, e.g., to multiple displays. Not good.

The current implementation is relatively simple and robust, and common to all
Radeons - at least as long as the vblank irq doesn't consistently miss the
whole vblank interval due to large irq execution times like here ;-)  -- My
desire is to get very reliable and sub-millisecond precise timestamps for my
applications (neuro-science stuff), but problems there would also cause screen
corruption and other problems for the average user.

Looking at the current code, R600 and later seem to have pageflip completion
irq's. They are currently acknowledged by the irq handler, but not used at the
moment.

For >= R600 maybe we could do the pageflip completion/timestamping in a
dedicated pageflip irq handler and do the programming of the pageflip in the
fence interrupt handler? That would be as early as possible - as soon as the
backbuffer is swap ready. One could move the calls to the "flip programming
part" of radeon_crtc_handle_flip() from the vblank irq handler to the fence irq
handler, basically unmodified with minimal overhead. And then put the "flip
completion part " of radeon_crtc_handle_flip() in a separate function, called
from the pageflip irq handler.

There's a little issue then with the ordering of vblank irq's wrt. pageflip
completion irq's and we'd need to take care of this for correct timestamping,
but we can solve that in the same way as we do for the intel-kms driver, which
also uses pageflip completion interrupts and has to take care of this ordering
as well to get correct timestamps.

For a future pageflip ioctl() v2 it would make sense to allow synchronous
flipping of multiple crtc's with one ioctl() invocation to allow for tear-free
multi-display swaps and for implementation of things like swap group
extensions. This could be easier if pageflips are scheduled from within the
fence irq handler or even earlier. With the current implementation and six
display heads like on evergreen/eyefinity hardware it can get difficult to
implement reliable swaps across multiple displays.

-mario

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.