[Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

Fri Sep 2 01:06:33 UTC 2016

On 02/09/16 12:37 AM, Alex Deucher wrote:
> On Thu, Sep 1, 2016 at 11:28 AM, Jason Ekstrand <jason at jlekstrand.net> wrote:
>> On Aug 31, 2016 11:39 PM, "Michel Dänzer" <michel at daenzer.net> wrote:
>>> On 01/09/16 02:05 PM, Jason Ekstrand wrote:
>>>> On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer <michel at daenzer.net
>>>> <mailto:michel at daenzer.net>> wrote:
>>>>
>>>>     On 31/08/16 11:21 PM, Jason Ekstrand wrote:
>>>>     > On Aug 19, 2016 12:07 AM, "Michel Dänzer" <michel at daenzer.net
>>>> <mailto:michel at daenzer.net>
>>>>     > <mailto:michel at daenzer.net <mailto:michel at daenzer.net>>> wrote:
>>>>     >> From: Michel Dänzer <michel.daenzer at amd.com
>>>> <mailto:michel.daenzer at amd.com>
>>>>     > <mailto:michel.daenzer at amd.com <mailto:michel.daenzer at amd.com>>>
>>>>     >>
>>>>     >> Always use 3 buffers when flipping. With only 2 buffers, we have
>>>> to wait
>>>>     >> for a flip to complete (which takes non-0 time even with
>>>> asynchronous
>>>>     >> flips) before we can start working on the next frame. We were
>>>> previously
>>>>     >> only using 2 buffers for flipping if the X server supports
>>>> asynchronous
>>>>     >> flips, even when we're not using asynchronous flips. This could
>>>> result
>>>>     >> in bad performance (the referenced bug report is an extreme case,
>>>> where
>>>>     >> the inter-frame stalls were preventing the GPU from reaching its
>>>> maximum
>>>>     >> clocks).
>>>>     >
>>>>     > Sorry for the post-push review but I don't usually pay much
>>>> attention to
>>>>     > the window system code.  In any case, I believe you're doing your
>>>>     > counting wrong.  When flipping with swapinterval=0, you need 4
>>>> buffers:
>>>>     >
>>>>     > 1. The buffer currently being scanned out  (will be released at
>>>> next vblank)
>>>>     > 2. The buffer X has queued for scanout but is waiting on vblank
>>>>
>>>>     s/vblank/flip/g, since async flips may not wait for vblank, but
>>>> yeah.
>>>>
>>>>     > 3. The buffer the application has just submitted which X will
>>>> queue next
>>>>     > of it doesn't get another before the window closes.
>>>>     > 4. The buffer the application is using for rendering.
>>>>     >
>>>>     > With only 3, you get a stall during that window in which X has
>>>> queued
>>>>     > another flip but we're waiting on vblank before the flip begins.
>>>> An I
>>>>     > missing something?
>>>>
>>>>     Nothing, except maybe the paragraph below stating that I couldn't
>>>>     measure any benefit from using 4 buffers. :) I'm not exactly sure
>>>> why,
>>>>     but I suspect it might be because even with just 3 buffers, the GPU
>>>> can
>>>>     always work on at least one frame ahead of time.
>>>>
>>>>     Also note that even before my change, we were only using 3 buffers
>>>> when
>>>>     the X driver supports async flips (with swap interval 0; only 2
>>>> buffers
>>>>     with swap interval > 0).
>>>>
>>>>
>>>> Yes, because with async flips you don't have a buffer sitting queued in
>>>> the kernel waiting to be flipped which you can't cancel.
>>>
>>> Actually, there is. Even async flips take non-0 time to complete.
>>>
>>>
>>>> that makes perfect sense.
>>>
>>> What exactly does? My change may not be perfect, but the logic before it
>>> was mostly backwards.
>>
>> I think perhaps the problem here is that I don't know what you mean by
>> "async flips".  It's an X term that obviously does not mean what I thought
>> it meant.
> 
> Async means immediate (or as close to it as possibly, maybe hsync
> depending on the hw) not at vsync.

Exactly.

>>>>     That said, I'd be interested in hearing about any test cases where 4
>>>>     buffers provide a significant boost over 3.
>>>>
>>>>
>>>> A little history that may be useful: Quadbuffering was originally added
>>>> for DRI3+present here:
>>>>
>>>>
>>>> https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a355556ef5fe23056299a77414f9ad8b5e5a1d
>>>
>>> So the commit message claims. If you look at the code after that change
>>> though, it's basically impossible to end up with 4 buffers (at least
>>> permanently), since it would require all these conditions to be true at
>>> the same:
>>>
>>> 1. priv->flipping (the last Present request was completed as a flip)
>>> 2. !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC) (the X
>>>    driver doesn't support async flips)
>>> 3. priv->swap_interval == 0
>>>
>>> Given 2, 1 & 3 are mutually exclusive.
>>
>> I'm not seeing how 1 & 3 are mutually exclusive.  priv->swap_interval
>> doesn't seem to have anything to do with whether or not you're flipping.

priv->swap_interval == 0 can only use flips if async flips are supported.

So WRT https://bugs.freedesktop.org/show_bug.cgi?id=97549, let's not
jump to any conclusions but look at how many buffers actually end up
being used for what reasons in each case. I suspect there might be some
surprises. :)

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160902/4c2f087d/attachment.sig>