[Mesa-dev] Improving ralloc performance for the GLSL compiler

Tue Aug 30 11:39:24 UTC 2016

Results for single-threaded shader-db (using shaders from one game
only) including LLVM compilation:

Default:
real    0m59.606s
user    0m59.488s
sys    0m0.104s

Only ralloc is using jemalloc:
real    0m55.284s (7.2% less time)
user    0m55.032s
sys    0m0.244s

Ralloc is using my linear allocator:
real    0m53.418s (10.4% less time)
user    0m53.200s
sys    0m0.208s

Marek

On Tue, Aug 30, 2016 at 11:51 AM, Marek Olšák <maraeo at gmail.com> wrote:
> Hi,
>
> Recently I discovered that our GLSL compiler spends a lot of time in
> rzalloc_size, so I looked at possible options to optimize that. It's
> worth noting that too many existing allocations slow down subsequent
> malloc calls, which in turn slows down the GLSL compiler. When I kept
> 5 instances of LLVMContext alive between compilations (I wanted to
> reuse them), the GLSL compiler slowed down. That shows that the GLSL
> compiler performance is too dependent on the size and complexity of
> the heap.
>
> So I decided to write my own linear allocator and then compared it
> with jemalloc preloaded by LD, and jemalloc linked statically and used
> by ralloc only.
>
> The test was shader-db using AMD's shader collection. The command line was:
> time GALLIUM_NOOP=1 shader-db/run shaders
> The noop driver ensures the compilation process ends with TGSI.
>
>
> Default Mesa:
> real    0m58.343s
> user    3m48.828s
> sys    0m0.760s
>
> Mesa with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1:
> real    0m48.550s (17% less time)
> user    3m9.544s
> sys    0m1.700s
>
> Ralloc using _mesa_je_{calloc, realloc, free} and Mesa links against
> my libmesa_jemalloc_pic.a:
> real    0m49.580s (15% less time)
> user    3m14.452s
> sys    0m0.996s
>
> Ralloc using my own linear allocator that allocates out of 32KB
> buffers for 512b and smaller allocations:
> real    0m46.521s (20% less time)
> user    3m1.304s
> sys    0m1.740s
>
>
> Now let's test complete compilation down to GCN bytecode:
>
> Default Mesa:
> real    1m57.634s
> user    7m41.692s
> sys    0m1.824s
>
> Mesa with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.1:
> real    1m42.604s (13% less time)
> user    6m39.776s
> sys    0m3.828s
>
> Ralloc using _mesa_je_{calloc, realloc, free} and Mesa links against
> my libmesa_jemalloc_pic.a:
> real    1m44.413s (11% less time)
> user    6m48.808s
> sys    0m2.480s
>
> Ralloc using my own linear allocator:
> real    1m40.486s (14.6% less time)
> user    6m34.456s
> sys    0m2.224s
>
>
> The linear allocator that I wrote has a very high memory usage due to
> the inability to free 32KB blocks if those blocks have at least one
> living allocation. The workaround would be to do realloc() when
> changing a ralloc parent in order to "defragment" the memory, but
> that's more involved.
>
> I don't know much about glibc, but it's hard to believe that glibc
> people have been purposely ignoring jemalloc for so long. There must
> be some anti-performance politics going on, but enough of
> speculations.
>
> If we don't care about memory usage, let's use my allocator. If we do,
> let's import jemalloc into the Mesa tree and use it for ralloc. That
> "11% less time" spent in the shader compiler (which includes LLVM)
> would be nice to have.
>
> Opinions?
>
> Marek