[Mesa-dev] [PATCH 0/8] i965: gl_TessLevel rescrambling in NIR

Wed Jan 4 13:16:41 UTC 2017

Hi,

On 04.01.2017 13:07, Kenneth Graunke wrote:
> This series reworks i965's handling of gl_TessLevelInner/Outer[] arrays.
> Instead of using lower_tess_levels to turn them into vec4/vec2s, we pass
> them through to NIR and make them compact arrays (where array indexing
> translates to enhanced layouts components).
>
> This has some nice benefits.  In the last patch, we're able to drop
> reswizzling and writemask-munging for load_output and store_output
> in both the scalar TCS and vec4 TCS backends, as well as code to do
> the same for TES system values.  That's 5 copies of backend code
> replaced by a small amount of extra code in remap_patch_urb_offsets.
>
> It also means we can drop TES handling entirely - the ordinary input
> handling code will handle it just fine.

Have you tried whether this makes any perf difference in GpuTest v0.7 
TessMark, GfxBench v4 tessellation, or in SynMark2 v7.0 terrain 
tessellation tests?

> This is the first step toward tessellation support in Vulkan (anv).
> (lower_tess_levels is written in GLSL IR, so we need a replacement.)

Are there yet other use-cases for Vulkan tessellation besides Sacha 
Willems' three tests here:
	https://github.com/SaschaWillems/Vulkan
?

	 - Eero

> This has an impact on shader-db's TCS shaders (but not a single TES):
>
> With scalar TCS/TES:
>
>    total instructions in shared programs: 13388151 -> 13387794 (-0.00%)
>    instructions in affected programs: 31920 -> 31563 (-1.12%)
>    helped: 75
>    HURT: 0
>
>    total cycles in shared programs: 257010676 -> 257008504 (-0.00%)
>    cycles in affected programs: 165632 -> 163460 (-1.31%)
>    helped: 75
>    HURT: 0
>
> With vec4 TCS/TES:
>
>    total instructions in shared programs: 13345621 -> 13345681 (0.00%)
>    instructions in affected programs: 18593 -> 18653 (0.32%)
>    helped: 36
>    HURT: 25
>
>    total cycles in shared programs: 256761898 -> 256759952 (-0.00%)
>    cycles in affected programs: 266644 -> 264698 (-0.73%)
>    helped: 172
>    HURT: 44
>
> The vec4 stats are not great, but I don't expect it to make much of a
> performance difference - TCS isn't usually the bottleneck (TES is).
> They could be improved by writing a peephole pass to detect load/stores
> to the same base+offset with consecutive scalar components and turn them
> into vec2/vec4 load/stores.