[Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

Fri Sep 2 14:41:37 UTC 2016

Le 02/09/2016 à 03:06, Michel Dänzer a écrit :
> On 02/09/16 12:37 AM, Alex Deucher wrote:
>> On Thu, Sep 1, 2016 at 11:28 AM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>>> On Aug 31, 2016 11:39 PM, "Michel Dänzer" <michel at daenzer.net> wrote:
>>>> On 01/09/16 02:05 PM, Jason Ekstrand wrote:
>>>>> On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer <michel at daenzer.net
>>>>> <mailto:michel at daenzer.net>> wrote:
>>>>>
>>>>>      On 31/08/16 11:21 PM, Jason Ekstrand wrote:
>>>>>      > On Aug 19, 2016 12:07 AM, "Michel Dänzer" <michel at daenzer.net
>>>>> <mailto:michel at daenzer.net>
>>>>>      > <mailto:michel at daenzer.net <mailto:michel at daenzer.net>>> wrote:
>>>>>      >> From: Michel Dänzer <michel.daenzer at amd.com
>>>>> <mailto:michel.daenzer at amd.com>
>>>>>      > <mailto:michel.daenzer at amd.com <mailto:michel.daenzer at amd.com>>>
>>>>>      >>
>>>>>      >> Always use 3 buffers when flipping. With only 2 buffers, we have
>>>>> to wait
>>>>>      >> for a flip to complete (which takes non-0 time even with
>>>>> asynchronous
>>>>>      >> flips) before we can start working on the next frame. We were
>>>>> previously
>>>>>      >> only using 2 buffers for flipping if the X server supports
>>>>> asynchronous
>>>>>      >> flips, even when we're not using asynchronous flips. This could
>>>>> result
>>>>>      >> in bad performance (the referenced bug report is an extreme case,
>>>>> where
>>>>>      >> the inter-frame stalls were preventing the GPU from reaching its
>>>>> maximum
>>>>>      >> clocks).
>>>>>      >
>>>>>      > Sorry for the post-push review but I don't usually pay much
>>>>> attention to
>>>>>      > the window system code.  In any case, I believe you're doing your
>>>>>      > counting wrong.  When flipping with swapinterval=0, you need 4
>>>>> buffers:
>>>>>      >
>>>>>      > 1. The buffer currently being scanned out  (will be released at
>>>>> next vblank)
>>>>>      > 2. The buffer X has queued for scanout but is waiting on vblank
>>>>>
>>>>>      s/vblank/flip/g, since async flips may not wait for vblank, but
>>>>> yeah.
>>>>>
>>>>>      > 3. The buffer the application has just submitted which X will
>>>>> queue next
>>>>>      > of it doesn't get another before the window closes.
>>>>>      > 4. The buffer the application is using for rendering.
>>>>>      >
>>>>>      > With only 3, you get a stall during that window in which X has
>>>>> queued
>>>>>      > another flip but we're waiting on vblank before the flip begins.
>>>>> An I
>>>>>      > missing something?
>>>>>
>>>>>      Nothing, except maybe the paragraph below stating that I couldn't
>>>>>      measure any benefit from using 4 buffers. :) I'm not exactly sure
>>>>> why,
>>>>>      but I suspect it might be because even with just 3 buffers, the GPU
>>>>> can
>>>>>      always work on at least one frame ahead of time.
>>>>>
>>>>>      Also note that even before my change, we were only using 3 buffers
>>>>> when
>>>>>      the X driver supports async flips (with swap interval 0; only 2
>>>>> buffers
>>>>>      with swap interval > 0).
>>>>>
>>>>>
>>>>> Yes, because with async flips you don't have a buffer sitting queued in
>>>>> the kernel waiting to be flipped which you can't cancel.
>>>> Actually, there is. Even async flips take non-0 time to complete.
>>>>
>>>>
>>>>> that makes perfect sense.
>>>> What exactly does? My change may not be perfect, but the logic before it
>>>> was mostly backwards.
>>> I think perhaps the problem here is that I don't know what you mean by
>>> "async flips".  It's an X term that obviously does not mean what I thought
>>> it meant.
>> Async means immediate (or as close to it as possibly, maybe hsync
>> depending on the hw) not at vsync.
> Exactly.
>
>
>>>>>      That said, I'd be interested in hearing about any test cases where 4
>>>>>      buffers provide a significant boost over 3.
>>>>>
>>>>>
>>>>> A little history that may be useful: Quadbuffering was originally added
>>>>> for DRI3+present here:
>>>>>
>>>>>
>>>>> https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a355556ef5fe23056299a77414f9ad8b5e5a1d
>>>> So the commit message claims. If you look at the code after that change
>>>> though, it's basically impossible to end up with 4 buffers (at least
>>>> permanently), since it would require all these conditions to be true at
>>>> the same:
>>>>
>>>> 1. priv->flipping (the last Present request was completed as a flip)
>>>> 2. !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC) (the X
>>>>     driver doesn't support async flips)
>>>> 3. priv->swap_interval == 0
>>>>
>>>> Given 2, 1 & 3 are mutually exclusive.
>>> I'm not seeing how 1 & 3 are mutually exclusive.  priv->swap_interval
>>> doesn't seem to have anything to do with whether or not you're flipping.
> priv->swap_interval == 0 can only use flips if async flips are supported.
This may not be always true. In particular with my XWayland Present 
implementation (yeah it's been years, perhaps time to work on it again),
you could have:
. 1 buffer on the screen
. 1 buffer scheduled for next flip kernel side
. 1 buffer old than the previous buffers, and that will get scheduled 
for flip after if no new buffer arrives (and will be released if a new 
buffer arrives)

All three kept by the Wayland compositor.
Thus you need a fourth buffer to keep rendering.

Axel
>
>
> So WRT https://bugs.freedesktop.org/show_bug.cgi?id=97549, let's not
> jump to any conclusions but look at how many buffers actually end up
> being used for what reasons in each case. I suspect there might be some
> surprises. :)
>
>