[Mesa-dev] [Bug 97549] [SNB, BXT] up to 40% perf drop from "loader/dri3: Overhaul dri3_update_num_back" commit

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Aug 31 13:55:01 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=97549

            Bug ID: 97549
           Summary: [SNB, BXT] up to 40% perf drop from "loader/dri3:
                    Overhaul dri3_update_num_back" commit
           Product: Mesa
           Version: git
          Hardware: x86-64 (AMD64)
                OS: All
            Status: NEW
          Severity: normal
          Priority: high
         Component: Other
          Assignee: mesa-dev at lists.freedesktop.org
          Reporter: eero.t.tamminen at intel.com
        QA Contact: mesa-dev at lists.freedesktop.org
                CC: michel at daenzer.net

Following commit regresses performance hugely with DRI3 in synthetic benchmarks
both on Sandybridge and Broxton.

commit 1e3218bc5ba2b739261f0c0bacf4eb662d377236
Author:     Michel Dänzer <michel.daenzer at amd.com>
AuthorDate: Wed Aug 17 17:02:04 2016 +0900
Commit:     Michel Dänzer <michel at daenzer.net>
CommitDate: Thu Aug 25 17:40:24 2016 +0900

    loader/dri3: Overhaul dri3_update_num_back

    Always use 3 buffers when flipping. With only 2 buffers, we have to wait
    for a flip to complete (which takes non-0 time even with asynchronous
    flips) before we can start working on the next frame. We were previously
    only using 2 buffers for flipping if the X server supports asynchronous
    flips, even when we're not using asynchronous flips. This could result
    in bad performance (the referenced bug report is an extreme case, where
    the inter-frame stalls were preventing the GPU from reaching its maximum
    clocks).

    I couldn't measure any performance boost using 4 buffers with flipping.
    Performance actually seemed to go down slightly, but that might have
    been just noise.

    Without flipping, a single back buffer is enough for swap interval 0,
    but we need to use 2 back buffers when the swap interval is non-0,
    otherwise we have to wait for the swap interval to pass before we can
    start working on the next frame. This condition was previously reversed.

    Cc: "12.0 11.2" <mesa-stable at lists.freedesktop.org>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
    Reviewed-by: Frank Binns <frank.binns at imgtec.com>
    Reviewed-by: Eric Anholt <eric at anholt.net>


Reverting the batch restores earlier performance (bisect done on Broxton,
revert tested on Sandybridge, so same commit is problem for both).

Impact is larger for tests with higher FPS, and naturally affects only onscreen
versions of the tests.  Both fullscreen and windowed+composited tests were
affected.

On Sandybridge impact is up to 35% (SynMark Batch tests), 25% in GpuTest
Triangle test, and less in other tests.

On Broxton the drop affects more tests (due to better GPU, heavier tests have
higher FPS), even few tests that are normally fully ALU bound:
* SynMark v6: up to 40% (Batch tests)
* GfxBench v4: 35% ALU, 25% Driver, 10% Tess tests
* Lightsmark 2008: 20%
* GpuTest 0.7: 15% Triangle, Julia32 & Plot3D tests
* GLB 2.7: 10% Egypt

The change doesn't seem to affect HSW, BDW nor SKL.  I don't know why.

Issue doesn't seem to be related to FPS (occasionally) being limited to some
multiple of 60 FPS like in the earlier DRI3 perf bug.  My assumption is that
the buffering change indirectly affects some memory setting, but I don't know
what, as SNB & BXT different in that respect:
* SNB has LLC, but BXT doesn't
* AFAIK Intel DDX supports SNA for SNB, but not yet for BXT

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20160831/e173783b/attachment.html>


More information about the mesa-dev mailing list