[Mesa-dev] [PATCH 00/15] GLSL memory allocation rework for faster compilation

Jason Ekstrand jason at jlekstrand.net
Sat Oct 8 15:58:30 UTC 2016


FYI, we use ralloc for a lot more than just the glsl compiler so the first
few changes make me a bit nervous.  There was someone working on making our
driver more I undefined-memory-friendly but I don't know what happened to
those patches.

On Oct 8, 2016 3:58 AM, "Marek Olšák" <maraeo at gmail.com> wrote:

Hi,

This patch series reduces the number of malloc calls in the GLSL
compiler by 63%. That leads to better compile times and less heap
thrashing.

It's done by switching memory allocations in the GLSL compiler to my
new linear allocator that allocates out of a fixed-sized buffer with
a monotonically increasing offset. If more buffers are needed, it
chains them.

The new allocator is used in all places where short-lived allocations
are used with a high number of malloc calls. The series also contains
other improvements not related to the new allocator that also improve
compile times. The results are below.

I tested my shader-db with shaders only being compiled to TGSI.
(noop gallium driver)


master + libc's malloc:

 real   0m54.182s
 user   3m33.640s
 sys    0m0.620s
 maxmem 275 MB


master + jemalloc preloaded:

 real   0m45.044s
 user   2m56.356s
 sys    0m1.652s
 maxmem 284 MB


the series + libc's malloc:

 real   0m46.221s
 user   3m2.080s
 sys    0m0.544s
 maxmem 270 MB


the series + jemalloc preloaded:

 real   0m40.729s
 user   2m39.564s
 sys    0m1.232s
 maxmem 284 MB


The series without jemalloc almost caught up with jemalloc + master.
However, jemalloc also benefits.

Current Mesa needs 54.182s and it drops to 40.729s with my series and
jemalloc. The total change in compile time is -25% if we incorporate
both. Without jemalloc, the difference is only -14.7%.

With radeonsi, the improvement is approx. slightly more than 1/2 of that
(if you add the LLVM time). However, radeonsi also has asynchronous
shader compilation hiding LLVM overhead in some cases, so it depends.

Drivers with faster compiler backends will benefit more than radeonsi,
but will probably not reach -25% or -14.7% (except softpipe, which uses
TGSI as-is).

The memory usage looks reasonable in all tested cases.

Note: One of the first patches moves memset from ralloc to rzalloc.
I tested and fixed the GLSL source -> TGSI path, but other codepaths
may break, and you need to use valgrind to find all uninitialized
variables that relied on ralloc doing memset (if there are any).

You can also find it here:
https://cgit.freedesktop.org/~mareko/mesa/log/?h=glsl-alloc-rework

Please review.

 src/compiler/glsl/ast.h                             |   4 +-
 src/compiler/glsl/ast_to_hir.cpp                    |   4 +-
 src/compiler/glsl/ast_type.cpp                      |  13 ++-
 src/compiler/glsl/glcpp/glcpp-lex.l                 |   2 +-
 src/compiler/glsl/glcpp/glcpp-parse.y               | 203
+++++++++++++++++---------------------
 src/compiler/glsl/glcpp/glcpp.h                     |   1 +
 src/compiler/glsl/glsl_lexer.ll                     |  16 +--
 src/compiler/glsl/glsl_parser.yy                    | 202
+++++++++++++++++++-------------------
 src/compiler/glsl/glsl_parser_extras.cpp            |   6 +-
 src/compiler/glsl/glsl_parser_extras.h              |   4 +-
 src/compiler/glsl/glsl_symbol_table.cpp             |  19 ++--
 src/compiler/glsl/glsl_symbol_table.h               |   1 +
 src/compiler/glsl/ir.cpp                            |   4 +
 src/compiler/glsl/ir.h                              |  13 ++-
 src/compiler/glsl/link_uniform_blocks.cpp           |   2 +-
 src/compiler/glsl/list.h                            |   2 +-
 src/compiler/glsl/lower_packed_varyings.cpp         |   8 +-
 src/compiler/glsl/opt_constant_propagation.cpp      |  14 ++-
 src/compiler/glsl/opt_copy_propagation.cpp          |   7 +-
 src/compiler/glsl/opt_copy_propagation_elements.cpp |  19 ++--
 src/compiler/glsl/opt_dead_code_local.cpp           |  12 ++-
 src/compiler/glsl_types.cpp                         |  38 +------
 src/compiler/glsl_types.h                           |   6 +-
 src/compiler/nir/nir.c                              |   8 +-
 src/compiler/spirv/vtn_variables.c                  |   3 +-
 src/gallium/drivers/freedreno/ir3/ir3.c             |   2 +-
 src/gallium/drivers/vc4/vc4_cl.c                    |   2 +-
 src/gallium/drivers/vc4/vc4_program.c               |   2 +-
 src/gallium/drivers/vc4/vc4_simulator.c             |   5 +-
 src/mesa/drivers/dri/i965/brw_state_batch.c         |   5 +-
 src/util/ralloc.c                                   | 392
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
 src/util/ralloc.h                                   |  93
++++++++++++++++--
 32 files changed, 782 insertions(+), 330 deletions(-)

Marek
_______________________________________________
mesa-dev mailing list
mesa-dev at lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/mesa-dev/attachments/20161008/ba799e04/attachment-0001.html>


More information about the mesa-dev mailing list