[Bug 99398] 1% perf drop in GFXBench v4 tessellation test with "nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences"
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Fri Jan 13 14:25:52 UTC 2017
https://bugs.freedesktop.org/show_bug.cgi?id=99398
Bug ID: 99398
Summary: 1% perf drop in GFXBench v4 tessellation test with
"nir: Turn bcsel of +/- 1.0 and 0.0 into b2f
sequences"
Product: Mesa
Version: git
Hardware: Other
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: Drivers/DRI/i965
Assignee: intel-3d-bugs at lists.freedesktop.org
Reporter: eero.t.tamminen at intel.com
QA Contact: intel-3d-bugs at lists.freedesktop.org
Following commit drops GFXBench v4 tessellation test (onscreen & offscreen)
performance by 0.5 - 1% on GEN8 & GEN9:
-----------------------------------------------
commit 3371de38f282c77461bbe5007a2fec2a975776df
Author: Kenneth Graunke <kenneth at whitecape.org>
AuthorDate: Tue Aug 9 01:44:38 2016 -0700
Commit: Timothy Arceri <timothy.arceri at collabora.com>
CommitDate: Mon Jan 9 12:32:16 2017 +1100
nir: Turn bcsel of +/- 1.0 and 0.0 into b2f sequences
-----------------------------------------------
Commit affects the tessellation evaluation shaders in this test-case:
-----------------------------------------------
Native code for unnamed tessellation evaluation shader GLSL10
-SIMD8 shader: 324 instructions. 1 loops. 7774 cycles. 0:0 spills:fills.
Promoted 11 constants. Compacted 5184 to 3312 bytes (36%)
+SIMD8 shader: 323 instructions. 1 loops. 8050 cycles. 0:0 spills:fills.
Compacted 5168 to 3280 bytes (37%)
...
Native code for unnamed tessellation evaluation shader GLSL15
-SIMD8 shader: 328 instructions. 1 loops. 7778 cycles. 0:0 spills:fills.
Promoted 11 constants. Compacted 5248 to 3360 bytes (36%)
+SIMD8 shader: 327 instructions. 1 loops. 8034 cycles. 0:0 spills:fills.
Promoted 11 constants. Compacted 5232 to 3328 bytes (36%)
-----------------------------------------------
They're otherwise the same shader except that latter defines USE_GEOMSHADER,
which causes few extra calculates for frag_color at the end.
On quick look there aren't much differences in the generated assembly. 2 sel()
instructions have changed to extra cmp.ge.f0(). End part of the shader seems
to have a worse scheduling. First shader has also now more register bank
conflicts.
-> I'm OK if this is handled as WONTFIX, I just wanted to document it.
Besides marginal change in pixels rendered by GpuTest Piano, there were no
functional or measurable performance changes from this change in rest of the
tests we're tracking.
--
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-3d-bugs/attachments/20170113/deb218d3/attachment.html>
More information about the intel-3d-bugs
mailing list