[Mesa-dev] [PATCH 09/12] glcpp: Avoid unnecessary linear_strdup

Thu Jan 19 00:16:12 UTC 2017

On 01/07/2017 11:02 AM, Vladislav Egorov wrote:
> The lexer copies every identifier it finds. At first look it seems
> needless, because all these identifiers are already available stored
> inside of shader text, but my experiments with immutable explicitly
> sized strings (ptr, len) resulted in a lot of code change and didn't
> result in any speed increase.
> 
> Still, it can't be improved a bit. Length of the token was already
> calculated by flex and stored in yyleng, no need to implicitly call
> strlen() via linear_strdup().
> 
> I see no point in other strdups(). They copy strings that will not be
> modified and will live at least to the end of the preprocessing stage.
> The only exception is strdup in _token_paste() that actually modifies
> the string, so strdup() is really necessary.

I think the first hunk is good.

For the rest, I had to do some research on yytext.  After looking at
https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_node/flex_8.html, I
still have some reservations.  My understanding is that, baring buffer
resizing for large tokens, yytext will reference the same memory across
the invocation of many rules.  The contents of that memory, however,
will change.  That seems catastrophically bad.  We're storing this
pointer and relying on its contents, but it's basically always random stuff.

The one thing that confuses me about this is the next section,
https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_node/flex_9.html, says:

    "Actions are free to modify yytext except for lengthening it (adding
    characters to its end--these will overwrite later characters in the
    input stream)."

I don't think it's at all authoritative, but
https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/handouts/050%20Flex%20In%20A%20Nutshell.pdf
says:

    "yytext is a null-terminated string containing the text of the
    lexeme just recognized as a token. This global variable is declared
    and managed in the lex.yy.c file. Do not modify its contents. The
    buffer is overwritten with each subsequent token, so you must make
    your own copy of a lexeme you need to store more permanently."

This partially contradicts the flex documentation (about modifying the
buffer), but it is consistent with my understanding of how the buffer is
managed.

So... I'm not sure what to make of this.

> ---
>  src/compiler/glsl/glcpp/glcpp-lex.l   |  3 ++-
>  src/compiler/glsl/glcpp/glcpp-parse.y | 12 ++++++------
>  2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/src/compiler/glsl/glcpp/glcpp-lex.l b/src/compiler/glsl/glcpp/glcpp-lex.l
> index 381b973..93e5cb3 100644
> --- a/src/compiler/glsl/glcpp/glcpp-lex.l
> +++ b/src/compiler/glsl/glcpp/glcpp-lex.l
> @@ -101,7 +101,8 @@ void glcpp_set_column (int  column_no , yyscan_t yyscanner);
>  #define RETURN_STRING_TOKEN(token)					\
>  	do {								\
>  		if (! parser->skipping) {				\
> -			yylval->str = linear_strdup(yyextra->linalloc, yytext);	\
> +			yylval->str = linear_alloc_child(yyextra->linalloc, yyleng + 1);	\
> +			memcpy(yylval->str, yytext, yyleng + 1);        \
>  			RETURN_TOKEN_NEVER_SKIP (token);		\
>  		}							\
>  	} while(0)
> diff --git a/src/compiler/glsl/glcpp/glcpp-parse.y b/src/compiler/glsl/glcpp/glcpp-parse.y
> index 47bf1e1..438ca51 100644
> --- a/src/compiler/glsl/glcpp/glcpp-parse.y
> +++ b/src/compiler/glsl/glcpp/glcpp-parse.y
> @@ -480,7 +480,7 @@ expression:
>  |	IDENTIFIER {
>  		$$.value = 0;
>  		if (parser->is_gles)
> -			$$.undefined_macro = linear_strdup(parser->linalloc, $1);
> +			$$.undefined_macro = $1;
>  		else
>  			$$.undefined_macro = NULL;
>  	}
> @@ -773,7 +773,7 @@ _string_list_append_item(glcpp_parser_t *parser, string_list_t *list,
>     string_node_t *node;
>  
>     node = linear_alloc_child(parser->linalloc, sizeof(string_node_t));
> -   node->str = linear_strdup(parser->linalloc, str);
> +   node->str = str;
>  
>     node->next = NULL;
>  
> @@ -1869,7 +1869,7 @@ _glcpp_parser_expand_node(glcpp_parser_t *parser, token_node_t *node,
>        token_list_t *expansion;
>        token_t *final;
>  
> -      str = linear_strdup(parser->linalloc, token->value.str);
> +      str = token->value.str;
>        final = _token_create_str(parser, OTHER, str);
>        expansion = _token_list_create(parser);
>        _token_list_append(parser, expansion, final);
> @@ -1905,7 +1905,7 @@ _parser_active_list_push(glcpp_parser_t *parser, const char *identifier,
>     active_list_t *node;
>  
>     node = linear_alloc_child(parser->linalloc, sizeof(active_list_t));
> -   node->identifier = linear_strdup(parser->linalloc, identifier);
> +   node->identifier = identifier;
>     node->marker = marker;
>     node->next = parser->active;
>  
> @@ -2114,7 +2114,7 @@ _define_object_macro(glcpp_parser_t *parser, YYLTYPE *loc,
>  
>     macro->is_function = 0;
>     macro->parameters = NULL;
> -   macro->identifier = linear_strdup(parser->linalloc, identifier);
> +   macro->identifier = identifier;
>     macro->replacements = replacements;
>  
>     entry = _mesa_hash_table_search(parser->defines, identifier);
> @@ -2150,7 +2150,7 @@ _define_function_macro(glcpp_parser_t *parser, YYLTYPE *loc,
>  
>     macro->is_function = 1;
>     macro->parameters = parameters;
> -   macro->identifier = linear_strdup(parser->linalloc, identifier);
> +   macro->identifier = identifier;
>     macro->replacements = replacements;
>  
>     entry = _mesa_hash_table_search(parser->defines, identifier);
>