[Mesa-dev] [PATCH] loader/dri3: Overhaul dri3_update_num_back

Tue Sep 6 03:41:50 UTC 2016

On 02/09/16 11:46 PM, Axel Davy wrote:
> Le 02/09/2016 à 16:41, Axel Davy a écrit :
>> Le 02/09/2016 à 03:06, Michel Dänzer a écrit :
>>> On 02/09/16 12:37 AM, Alex Deucher wrote:
>>>> On Thu, Sep 1, 2016 at 11:28 AM, Jason Ekstrand
>>>> <jason at jlekstrand.net> wrote:
>>>>> On Aug 31, 2016 11:39 PM, "Michel Dänzer" <michel at daenzer.net> wrote:
>>>>>> On 01/09/16 02:05 PM, Jason Ekstrand wrote:
>>>>>>> On Wed, Aug 31, 2016 at 7:00 PM, Michel Dänzer <michel at daenzer.net
>>>>>>> <mailto:michel at daenzer.net>> wrote:
>>>>>>>
>>>>>>>      That said, I'd be interested in hearing about any test cases
>>>>>>> where 4
>>>>>>>      buffers provide a significant boost over 3.
>>>>>>>
>>>>>>>
>>>>>>> A little history that may be useful: Quadbuffering was originally
>>>>>>> added
>>>>>>> for DRI3+present here:
>>>>>>>
>>>>>>>
>>>>>>> https://cgit.freedesktop.org/mesa/mesa/commit/?id=f7a355556ef5fe23056299a77414f9ad8b5e5a1d
>>>>>>>
>>>>>> So the commit message claims. If you look at the code after that
>>>>>> change
>>>>>> though, it's basically impossible to end up with 4 buffers (at least
>>>>>> permanently), since it would require all these conditions to be
>>>>>> true at
>>>>>> the same:
>>>>>>
>>>>>> 1. priv->flipping (the last Present request was completed as a flip)
>>>>>> 2. !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC)
>>>>>> (the X
>>>>>>     driver doesn't support async flips)
>>>>>> 3. priv->swap_interval == 0
>>>>>>
>>>>>> Given 2, 1 & 3 are mutually exclusive.
>>>>> I'm not seeing how 1 & 3 are mutually exclusive. priv->swap_interval
>>>>> doesn't seem to have anything to do with whether or not you're
>>>>> flipping.
>>> priv->swap_interval == 0 can only use flips if async flips are
>>> supported.
>> This may not be always true.

It is with the current Present implementation in xserver.

>> In particular with my XWayland Present implementation (yeah it's been
>> years, perhaps time to work on it again), you could have:
>> . 1 buffer on the screen
>> . 1 buffer scheduled for next flip kernel side
>> . 1 buffer old than the previous buffers, and that will get scheduled
>> for flip after if no new buffer arrives (and will be released if a new
>> buffer arrives)
> I meant "newer" of course
>>
>> All three kept by the Wayland compositor.
>> Thus you need a fourth buffer to keep rendering.

The same logic could be implemented in the core Present code in xserver.
My main concern about this is the increased latency due to using a sync
flip. If the application or user sets swap interval 0, I suspect the
intention tends to be to minimize latency. So maybe this logic should
only be used if the driver supports async flips but rejects them in
specific cases, e.g. because the user enabled TearFree.

Anyway, in the meantime, 4 buffers can only make any difference vs 3
buffers with async flips even in theory, and in practice I can only
measure ~1% difference with vblank_mode=0 glxgears in fullscreen (if
anyone can measure a bigger difference with any test case, please let me
know!). Is that enough to justify the additional memory usage for the
4th buffer?

With GpuTest Triangle in fullscreen, I actually get ~0.3% higher numbers
with 3 vs 4 buffers.

> If you use one buffer for swap interval = 0 and no flipping, it means
> you wait for the xserver to handle the presentation request and send the
> answer, before rendering again.
> Before there wasn't this synchronization, because there were two buffers.
> 
> Could this be the cause of the slowdown ?

It was indeed. The thing is, even now I can only measure any significant
difference between one or two buffers with vblank_mode=0 glxgears (not
even e.g. with GpuTest triangle at 800x600 windowed). I thought I tested
that with my previous change but must have messed up somehow. :(

-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer