[Mesa-dev] [PATCH 3/3] glx/dri3: Request non-vsynced Present for swapinterval zero.

Mario Kleiner mario.kleiner.de at gmail.com
Wed Dec 17 10:42:00 PST 2014


On 12/17/2014 12:45 PM, Eero Tamminen wrote:
> Hi,
>
> On 12/16/2014 08:30 PM, Mario Kleiner wrote:
>> On 12/16/2014 09:23 AM, Keith Packard wrote:
>>> Mario Kleiner <mario.kleiner.de at gmail.com> writes:
>>>
>>>> The 0 case is good for benchmarking.
>>> Sure, but the current code does benchmarking just fine. In fact, 
>>> because
>>> it doesn't copy queued frames that aren't the most recent before the
>>> vblank, benchmarks tend to run *faster* as a result, and people
>>> generally like that aspect of it...
>>>
>>
>> Hmm. For benchmarking i think i'd consider that a mild form of cheating.
>> You get higher fps because you skip processing like the whole gpu blit
>> overhead and host processing overhead for queuing / validating /
>> processing the copy command in the command stream, so the benchmark
>> numbers don't translate very well anymore in how the system would behave
>> in a non-benchmark situation?mesa-dev
>
> From performance numbers on Windows it's clear that Windows doesn't
> copy frames that happen faster than monitor update frequency.
>

Under desktop composition, yes. As far as i observed (and i think also 
read somewhere), their compositor wakes up once at the beginning of a 
refresh cycle and composites everything that needs composition and was 
submitted in the previous frame, then pageflips at the next vsync, so 
you'll always have at least 1 frame lag, while at the same time skipping 
frames if the client renders too fast. It's somewhat different for 
unredirected windows, but the rules of when a window gets unredirected 
are somewhat weird and also inconsistently (buggy) implemented on many 
drivers in my experience. OSX ditto. One reason why many of my timing 
sensitive users are fleeing to Linux...

> This isn't cheating.  It makes numbers more relevant as it minimizes
> Windowing system impact/distortion on application rendering performance.
>
> The problem with Windowing system doing extra copies is that it doesn't
> affect all applications equally.  If application is memory bandwidth
> limited, the windowing system impact is directly proportional to FPS.
> If application isn't memory bandwidth limited, it has no impact. And
> whether application is memory bandwidth limited, is HW dependent, which
> distorts also HW comparisons.

I can agree that benchmarking is hard and it depends on what you want to 
measure ;-) - so many benchmarks to pick from...

But my main concern is not the benchmarking, but the other use cases i 
mentioned for non-vsynced operation, and some consistency between 
implementations. It's difficult to explain to "normal" users why they 
should follow totally different procedures depending on 
XOrg-Version/Mesa-Version/Type and version of ddx/Which kernel they 
use/What distro and distro version they use/If its DRI2 or DRI3/what 
acceleration api is in use/whatever. Especially when i myself as someone 
who can read source code and does a lot of testing have huge trouble 
remembering all the special cases.

I just like api's to behave somewhat predictable over time and rather 
have more extra api for fine grained control and introducing new 
functionality than less api, so that i can make decisions about what 
tradeoffs to choose automatically in my app, instead of exposing tons of 
configuration howtos to my users.

-mario


>
>
>     - Eero
>
>> ... but read on below ...
>>
>>>> In my specific case i always want vsync'ed swap for actual visual
>>>> stimulation in neuroscience/medical settings, with no frame skipped
>>>> ever. The bonus use for me, except for benchmarking how fast the 
>>>> system
>>>> can go, is if one has a multi-display setup, e.g., dual-display for
>>>> stereoscopic stimulation - one display per eye, or some CAVE like 
>>>> setup
>>>> for VR with more than 2 displays. You want display updates and scanout
>>>> on all of them synchronized, so the scene stays coherent. One 
>>>> simple way
>>>> for visually testing multi-display sync is to intentionally swap 
>>>> all of
>>>> them without vsync, e.g., timed to swap in the middle of the 
>>>> scanout. If
>>>> the tear-lines on all displays are roughly at the same vertical 
>>>> position
>>>> and stay there then that's a good visual test if stuff works. There 
>>>> are
>>>> other ways to do it, but this is the one method that seems to work
>>>> cross-platform, without lots of mental context switching depending on
>>>> what os/gpu/server/driver combo with what settings one uses, and much
>>>> more easy to grasp for scientists with no graphics background. You can
>>>> see at a glance if stuff is roughly correct or not.
>>> It seems like you want something that the GL API doesn't express
>>> precisely; my reading of the  GL spec definitely lets Present work the
>>> way it does today, and as you avoid tearing *and* improve 
>>> performance in
>>> the vblank_mode=0 case, I'm very reluctant to change it.
>>
>>  From GLX_EXT_swap_control and MESA_swap_control:
>>
>> "If <interval> is set to a value of 0, buffer swaps are not
>>      synchronized to a video frame."
>>
>>
>> It depends on how you interpret the "not synchronized to a video frame"?
>> Can you explain your interpretation?
>>
>> I don't think the spec says anywhere that dropping old "not most recent
>> at vblank time" frames is allowed, like Present does atm.? And the
>> current Present implementation does synchronize the "buffer swap" of its
>> most recent received Pixmap to the [onset of] a video frame, so while
>> preventing tearing it goes a bit slower than it could go without vsync.
>>
>> Every past/other implementation than DRI3/Present that i have experience
>> with interpreted the spec the way that patch tries to restore, at least
>> everything i know of from OSX/Windows/Linux proprietary/DRI2, so my
>> interpretation is certainly a valid interpretation, and it is the one
>> that provides consistency and therefore the least surprise to
>> implementers of GL clients and end users.
>>
>>> Present could trivially offer a new bit to force tearing; I'm not sure
>>> how you'd get at that from GL though.
>>>
>>
>> It does already with PresentOptionAsync? It just needs to be used in
>> accordance with the mainstream interpretation of the _swap_control spec,
>> like this patch suggests.
>>
>> I'm not trying to claim here that the current behaviour of Mesa+Present
>> isn't useful for some types of applications like games. I'm just saying
>> it shouldn't be the default behaviour for swapinterval 0 or > 0. As far
>> as i understand the meaning, intention and origin of the
>> EXT_swap_control_tear extension, the current Present implementation
>> would implement a useful approximation of EXT_swap_control_tear for a
>> swapinterval of < 0. Not an exact implementation, but at least following
>> the spirit of that extension.
>>
>> So i'm arguing for restoring the default behaviour any other
>> implementation has with that patch, but providing the current behaviour
>> via sync_control_tear? Or maybe even some new sync_control_tear2 to
>> cover the difference between the current method and sync_control_tear.
>>
>> When we are at the topic, i can also send you my christmas wish list
>> with proposals for future mesa/server releases:
>>
>> 1 - Another thing i'd love to have, which would require a new option
>> "PresentOptionDontSkip" is the ability to not skip present requests
>> which are late. That would allow to take advantage of mesas triple/quad
>> buffering to queue frames for animations ahead of time for playing
>> animations or videos and be still certain that every queued frame was
>> shown at least for one video refresh cycle. I'd love to take advantage
>> of the new triple-buffering behaviour, or maybe even use Present
>> directly somehow for deeper n-buffering, but for some of my types of
>> application i'd need to be certain that frames are not ever skipped if
>> something gets late. As things are now, i'm forced to wait for swap
>> completion of each bufferswap before i can submit a new swapbuffers
>> request to make sure Present will never drop rendered frames, so i have
>> to enforce the constraints of double-buffering onto my application for
>> correctness although it could make use of n-buffering.
>>
>> This would also make sense for an improved OML_sync_control
>> implementation. That spec requires that no swap request is ever dropped,
>> citing:
>>
>> "If there are multiple outstanding swaps for the same window, at most
>> one such swap can be satisfied per increment of MSC. The order of
>> satisfying outstanding swaps of a window must be the order they were
>> issued. Each window that has an outstanding swap satisfied by the same
>> current MSC should have one swap done."
>>
>> Getting this behaviour is difficult or impossible without some
>> PresentOptionDontSkip.
>>
>> 2 - Some extension to INTEL_swap_events to be able to signal if a
>> present request was skipped, so i can find out for any specific "sbc" if
>> its rendering reached the eyes of my end users or was silently 
>> discarded.
>>
>> More later, will be away from the keyboard for a couple of hours,
>> -mario
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev



More information about the mesa-dev mailing list