[Bug 99398] 1% perf drop in GFXBench v4 tessellation test with "nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences"

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Fri Jan 13 14:25:52 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=99398

            Bug ID: 99398
           Summary: 1% perf drop in GFXBench v4 tessellation test with
                    "nir: Turn bcsel of +/- 1.0 and 0.0 into b2f
                    sequences"
           Product: Mesa
           Version: git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: Drivers/DRI/i965
          Assignee: intel-3d-bugs at lists.freedesktop.org
          Reporter: eero.t.tamminen at intel.com
        QA Contact: intel-3d-bugs at lists.freedesktop.org

Following commit drops GFXBench v4 tessellation test (onscreen & offscreen)
performance by 0.5 - 1% on GEN8 & GEN9:
-----------------------------------------------
commit 3371de38f282c77461bbe5007a2fec2a975776df
Author:     Kenneth Graunke <kenneth at whitecape.org>
AuthorDate: Tue Aug 9 01:44:38 2016 -0700
Commit:     Timothy Arceri <timothy.arceri at collabora.com>
CommitDate: Mon Jan 9 12:32:16 2017 +1100

    nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences
-----------------------------------------------

Commit affects the tessellation evaluation shaders in this test-case:
-----------------------------------------------
 Native code for unnamed tessellation evaluation shader GLSL10
-SIMD8 shader: 324 instructions. 1 loops. 7774 cycles. 0:0 spills:fills.
Promoted 11 constants. Compacted 5184 to 3312 bytes (36%)
+SIMD8 shader: 323 instructions. 1 loops. 8050 cycles. 0:0 spills:fills.
Compacted 5168 to 3280 bytes (37%)
...
 Native code for unnamed tessellation evaluation shader GLSL15
-SIMD8 shader: 328 instructions. 1 loops. 7778 cycles. 0:0 spills:fills.
Promoted 11 constants. Compacted 5248 to 3360 bytes (36%)
+SIMD8 shader: 327 instructions. 1 loops. 8034 cycles. 0:0 spills:fills.
Promoted 11 constants. Compacted 5232 to 3328 bytes (36%)
-----------------------------------------------

They're otherwise the same shader except that latter defines USE_GEOMSHADER,
which causes few extra calculates for frag_color at the end.

On quick look there aren't much differences in the generated assembly.  2 sel()
instructions have changed to extra cmp.ge.f0().  End part of the shader seems
to have a worse scheduling. First shader has also now more register bank
conflicts.

-> I'm OK if this is handled as WONTFIX, I just wanted to document it.

Besides marginal change in pixels rendered by GpuTest Piano, there were no
functional or measurable performance changes from this change in rest of the
tests we're tracking.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20170113/deb218d3/attachment.html>


More information about the intel-3d-bugs mailing list