[Mesa-dev] [PATCH 03/12] glcpp: Use Bloom filter before identifier search
Ian Romanick
idr at freedesktop.org
Wed Jan 18 23:03:57 UTC 2017
On 01/07/2017 11:02 AM, Vladislav Egorov wrote:
> Absolute majority of identifiers in shaders are not macro references.
> Many shaders don't use preprocessing macros at all. Almost all
> queries to parser->defines hash-table will be unsuccessful. Note
> that all predefined macros start either with "GL_" or with "__".
> Moreover, even user-defined macros are usually use upper case, while
> ordinary identifiers use lower case. It's easy to detect macros just
> looking on the first two characters of their name. So it makes sense
> to make a quick identifier lookup in a Bloom filter before making
> relatively expensive hash-table search call.
>
> Many schemes of Bloom are possible, but it seems that two mini 64 bit
> Bloom tables -- one for the 1st identifier's char, one for the second
> identifier's char -- with one trivial hash-function (6 lowest bits of
> the char) work very well. Considering that Latin-1 alphabetic chars
> and _ occupy positions from 65 to 122, every character will have its
> bit position in 64-bit word, distinguishing lower and upper case.
>
> Some games do things like "#define float4 float", this scheme would
> not help here, it barely improves heavy users of preprocessor like
> Talos Principle, but in general it works pretty well. In my shader-db
> it eliminates 90%+ hash table queries during preprocessing stage.
> One 64-bit Bloom table working on only 1st or only 2nd char will
> eliminate ~70-80% queries.
>
> For simplicity use the same scheme for both 32-bit and 64-bit archs.
> 32-bit archs still have pretty efficients ways to do 64-bit shifts.
> ---
> src/compiler/glsl/glcpp/glcpp-parse.y | 24 ++++++++++++++++++------
> src/compiler/glsl/glcpp/glcpp.h | 10 ++++++++++
> 2 files changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/src/compiler/glsl/glcpp/glcpp-parse.y b/src/compiler/glsl/glcpp/glcpp-parse.y
> index 4780c9f..47bf1e1 100644
> --- a/src/compiler/glsl/glcpp/glcpp-parse.y
> +++ b/src/compiler/glsl/glcpp/glcpp-parse.y
> @@ -1345,6 +1345,11 @@ glcpp_parser_create(glcpp_extension_iterator extensions, void *state, gl_api api
> glcpp_lex_init_extra (parser, &parser->scanner);
> parser->defines = _mesa_hash_table_create(NULL, _mesa_key_hash_string,
> _mesa_key_string_equal);
> + parser->defines_bloom0 = 0;
> + parser->defines_bloom1 = 0;
> + GLCPP_DEFINES_BLOOM_PUT(parser, "__LINE__");
> + GLCPP_DEFINES_BLOOM_PUT(parser, "__FILE__");
I'd add a comment here that __LINE__ and __FILE__ are not handled
through the hash with a "see also _glcpp_parser_expand_node".
> +
> parser->linalloc = linear_alloc_parent(parser, 0);
> parser->active = NULL;
> parser->lexing_directive = 0;
> @@ -1832,6 +1837,9 @@ _glcpp_parser_expand_node(glcpp_parser_t *parser, token_node_t *node,
> *last = node;
> identifier = token->value.str;
>
> + if (!GLCPP_DEFINES_BLOOM_GET(parser, identifier))
> + return NULL;
> +
> /* Special handling for __LINE__ and __FILE__, (not through
> * the hash table). */
> if (*identifier == '_') {
> @@ -2118,6 +2126,7 @@ _define_object_macro(glcpp_parser_t *parser, YYLTYPE *loc,
> glcpp_error (loc, parser, "Redefinition of macro %s\n", identifier);
> }
>
> + GLCPP_DEFINES_BLOOM_PUT(parser, identifier);
> _mesa_hash_table_insert (parser->defines, identifier, macro);
I see a potential danger here, and I'm waffling about the right way to
handle it. The potential danger is that someone adds another call to
_mesa_hash_table_insert(parser->defines, ...) but doesn't add the
GLCPP_DEFINES_BLOOM_PUT. That would be *miserable* to debug because it
might not always cause a problem. Oof.
Putting both calls in a utility function would help prevent the problem
for occurring, but it seems like there should be a reasonable way to
detect the problem after it occurs... maybe add an assertion that if the
string is not in the Bloom filter it is also not in the hash?
> }
>
> @@ -2153,6 +2162,7 @@ _define_function_macro(glcpp_parser_t *parser, YYLTYPE *loc,
> glcpp_error (loc, parser, "Redefinition of macro %s\n", identifier);
> }
>
> + GLCPP_DEFINES_BLOOM_PUT(parser, identifier);
> _mesa_hash_table_insert(parser->defines, identifier, macro);
> }
>
> @@ -2199,12 +2209,14 @@ glcpp_parser_lex(YYSTYPE *yylval, YYLTYPE *yylloc, glcpp_parser_t *parser)
> ret == ENDIF || ret == HASH_TOKEN) {
> parser->in_control_line = 1;
> } else if (ret == IDENTIFIER) {
> - struct hash_entry *entry = _mesa_hash_table_search(parser->defines,
> - yylval->str);
> - macro_t *macro = entry ? entry->data : NULL;
> - if (macro && macro->is_function) {
> - parser->newline_as_space = 1;
> - parser->paren_count = 0;
> + if (GLCPP_DEFINES_BLOOM_GET(parser, yylval->str)) {
> + struct hash_entry *entry = _mesa_hash_table_search(parser->defines,
> + yylval->str);
> + macro_t *macro = entry ? entry->data : NULL;
> + if (macro && macro->is_function) {
> + parser->newline_as_space = 1;
> + parser->paren_count = 0;
> + }
> }
> }
>
> diff --git a/src/compiler/glsl/glcpp/glcpp.h b/src/compiler/glsl/glcpp/glcpp.h
> index 068d83b..a9c08e6 100644
> --- a/src/compiler/glsl/glcpp/glcpp.h
> +++ b/src/compiler/glsl/glcpp/glcpp.h
> @@ -35,6 +35,14 @@
>
> #define yyscan_t void*
>
> +#define GLCPP_DEFINES_BLOOM_GET(parser, identifier) \
> + (((((parser)->defines_bloom0) >> ((identifier)[0] & 63)) & 1) & \
> + ((((parser)->defines_bloom1) >> ((identifier)[1] & 63)) & 1))
> +
> +#define GLCPP_DEFINES_BLOOM_PUT(parser, identifier) \
> + (parser)->defines_bloom0 |= (uint64_t)1 << ((identifier)[0] & 63); \
> + (parser)->defines_bloom1 |= (uint64_t)1 << ((identifier)[1] & 63);
> +
> /* Some data types used for parser values. */
>
> typedef struct expression_value {
> @@ -191,6 +199,8 @@ struct glcpp_parser {
> void *linalloc;
> yyscan_t scanner;
> struct hash_table *defines;
> + uint64_t defines_bloom0;
> + uint64_t defines_bloom1;
> active_list_t *active;
> int lexing_directive;
> int lexing_version_directive;
>
More information about the mesa-dev
mailing list