[Mesa-dev] [PATCH 0/8] i965: gl_TessLevel rescrambling in NIR

Wed Jan 4 11:07:24 UTC 2017

Hello,

This series reworks i965's handling of gl_TessLevelInner/Outer[] arrays.
Instead of using lower_tess_levels to turn them into vec4/vec2s, we pass
them through to NIR and make them compact arrays (where array indexing
translates to enhanced layouts components).

This has some nice benefits.  In the last patch, we're able to drop
reswizzling and writemask-munging for load_output and store_output
in both the scalar TCS and vec4 TCS backends, as well as code to do
the same for TES system values.  That's 5 copies of backend code
replaced by a small amount of extra code in remap_patch_urb_offsets.

It also means we can drop TES handling entirely - the ordinary input
handling code will handle it just fine.

This is the first step toward tessellation support in Vulkan (anv).
(lower_tess_levels is written in GLSL IR, so we need a replacement.)

This has an impact on shader-db's TCS shaders (but not a single TES):

With scalar TCS/TES:

   total instructions in shared programs: 13388151 -> 13387794 (-0.00%)
   instructions in affected programs: 31920 -> 31563 (-1.12%)
   helped: 75
   HURT: 0

   total cycles in shared programs: 257010676 -> 257008504 (-0.00%)
   cycles in affected programs: 165632 -> 163460 (-1.31%)
   helped: 75
   HURT: 0

With vec4 TCS/TES:

   total instructions in shared programs: 13345621 -> 13345681 (0.00%)
   instructions in affected programs: 18593 -> 18653 (0.32%)
   helped: 36
   HURT: 25

   total cycles in shared programs: 256761898 -> 256759952 (-0.00%)
   cycles in affected programs: 266644 -> 264698 (-0.73%)
   helped: 172
   HURT: 44

The vec4 stats are not great, but I don't expect it to make much of a
performance difference - TCS isn't usually the bottleneck (TES is).
They could be improved by writing a peephole pass to detect load/stores
to the same base+offset with consecutive scalar components and turn them
into vec2/vec4 load/stores.

--Ken