[Mesa-dev] [PATCH 2/7] glsl/glcpp: use ralloc_sprint_rewrite_tail to avoid slow vsprintf
Vladislav Egorov
vegorov180 at gmail.com
Sun Jan 1 10:55:13 UTC 2017
01.01.2017 06:41, Kenneth Graunke пишет:
> On Sunday, January 1, 2017 1:34:27 AM PST Marek Olšák wrote:
>> From: Marek Olšák <marek.olsak at amd.com>
>>
>> This reduces compile times by 4.5% with the Gallium noop driver and
>> gl_constants::GLSLOptimizeConservatively == true.
> Compile times of...what exactly? Do you have any statistics for this
> by itself?
>
> Assuming we add your helper, this patch looks reasonable.
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>
> BTW, I suspect you could get some additional speed up by changing
>
> parser->output = ralloc_strdup(parser, "");
>
> to something like:
>
> parser->output = ralloc_size(parser, strlen(orig_concatenated_src));
> parser->output[0] = '\0';
>
> to try and avoid reallocations. rewrite_tail will realloc just enough
> space every time it allocates, which means once you reallocate, you're
> going to be calling realloc on every single token. Yuck!
>
> ralloc/talloc's string libraries were never meant for serious string
> processing like the preprocessor does. They're meant for convenience
> when constructing debug messages which don't need to be that efficient.
>
> Perhaps a better approach would be to have the preprocessor do this
> itself. Just ralloc_size() output and initialize the null byte.
> reralloc to double the size if you need more space. At the end of
> preprocessing, reralloc to output_length at the end of free any waste
> from doubling.
>
> I suspect that would be a *lot* more efficient, and is probably what
> we should have done in the first place...
I have similar patch (maybe need 1-2 days to clean it up), and I've
tested both variants. String in exponentially growing (by +50%) string
buffer works better, but not *THAT* much better as I expected. It seems
that in the sequence of str = realloc(str, 1001); str = realloc(str,
1002); str = realloc(str, 1003), etc. most of reallocs will be
non-moving in both glibc's allocator and jemalloc. For example, jemalloc
have size classes that already grow exponentially by 15-25% - ..., 4K,
5K, 6K, 7K, 8K, 10K, 12K, 14K, 16K, 20K, 24K, .., 4M, 5M, ... realloc
will just test if the requested size belongs to the same size class and
do nothing. Reallocs inside of the same size class will be always
non-moving and almost free. Overall avoiding formatted printing (DOUBLE
formatted printing, which is entirely avoidable too) gives the single
largest boost to the pre-processor.
Benchmark on my shader-db (glcpp and shader-db's run smashed together to
do only preprocessing). Note that I used old jemalloc from Ubuntu 16.04,
which can be important, because jemalloc changed its size class strategy
since then.
perf stat --repeat 10
master 8.91s
master+jemalloc 8.60s
Marek's patch 5.50s
Marek's patch+jemalloc 5.03s
my string_buffer 4.57s
my string_buffer+jemalloc 4.43s
my series 3.83s
my series+jemalloc 3.68s
More information about the mesa-dev
mailing list