[Mesa-dev] [PATCH 00/12] Improve GLSL preprocessor performance

Sat Jan 7 19:02:01 UTC 2017

There is a lot of room for improvement in the preprocessor. Quick
benchmark on artificial 4Mb "shader" (16x concatenated Blender PBR
shader) of several popular C-like preprocessors (I wanted to also
add D's Warp, but didn't manage to compile it):

                  time      mem  page faults
clang 3.8         0.11s    32Mb  3K
gcc 5             0.063s   13Mb  1.3K
tcc               0.067s    3Mb  0.4K
glslangValidator  0.63s    25Mb  3K
glcpp (Mesa)      0.36s   127Mb  31K
glcpp+jemalloc    0.39s   182Mb  1K

Not only glcpp is significantly slower than other C-like
preprocessors (3x-6x slower), it allocates much more memory.

This patch series improves the preprocessor in the following ways:

  1. Print to exponentially growing string instead of using printf()
     and realloc() on each print.

  2. Use Bloom filters to avoid excessive hash-table queries.

  3. Create hand-written streamlined lexer/parser that bypasses
     flex/bison tokenization/printing for simple cases. This one
     is adds a lot of code, but it also greatly improves
     preprocessing speed.

A few benchmarks. The same 16x concatenated Blender PBR shader:

                   time      mem  page faults
glcpp              0.36s   127Mb  31K
glcpp-new          0.026s   13Mb  2.7K
glcpp-new+jemalloc 0.026s   20Mb  1K

A nice improvement both in speed and amount of used memory.
More realistic test. Preprocessing my whole shader-db (more than
51K shaders from various Steam games) using shader-db's run and
glcpp hybrid I hacked together:

         dumped from games  default shader-db's collection
Before   27.02s             0.52s
After    2.09s              0.14s

However, some games benefit very little from this series (Talos
Principle 0.45s -> 0.2s, Serious Sam 0.53s -> 0.22s, to name
a few). They are heavy users of preprocessor, and they hit
non-optimized path. It's possible to improve them too streamlining
skipping path of #if 0 ... #endif blocks. It's also possible to
increase speed of the fast path using SIMD-optimizations (Clang
for example uses SSE to skip multiline comments).

The series passes all Mesa's preprocessor tests. The output and error
output of the preprocessor after full shader-db's run is the same,
including line numbers in errors and so on. The only difference that
it generates a bit less trailing whitespace, but trailing whitespace
doesn't really matter for preprocessor. Other preprocessors drop
trailing whitespace entirely.

Vladislav Egorov (12):
  glcpp: Print preprocessor output to string_buffer
  glcpp: Avoid unnecessary strcmp()
  glcpp: Use Bloom filter before identifier search
  glcpp: Use string_buffer for continuations removal
  ralloc: Avoid calling vsnprintf() twice
  ralloc: Use strnlen() inside of strncat()
  glcpp: Skip unnecessary line continuations removal
  glcpp: Use strpbrk in the line continuations pass
  glcpp: Avoid unnecessary linear_strdup
  glcpp/tests: Allow different trailing whitespace
  glcpp: Create fast path hand-written scanner
  glcpp: Substitute trivial macros in the fast path

 src/compiler/glsl/glcpp/glcpp-lex.l      | 428 ++++++++++++++++++++++++++++++-
 src/compiler/glsl/glcpp/glcpp-parse.y    | 149 ++++++-----
 src/compiler/glsl/glcpp/glcpp.h          |  78 +++++-
 src/compiler/glsl/glcpp/pp.c             | 242 +++++++++++++----
 src/compiler/glsl/glcpp/tests/glcpp-test |   4 +-
 src/util/ralloc.c                        |  64 +++--
 6 files changed, 820 insertions(+), 145 deletions(-)

-- 
2.7.4