[Mesa-dev] [PATCH 2/2] glcpp: Replace multi-line comment with a space (even as part of macro definition)

Carl Worth cworth at cworth.org
Thu Dec 19 16:25:20 PST 2013


The preprocessor has always replaced multi-line comments with a single space
character, (as required by the specification), but as of commit
bd55ba568b301d0f764cd1ca015e84e1ae932c8b the lexer also emitted a NEWLINE
token for each newline within the comment, (in order to preserve line
numbers).

The emitting of NEWLINE tokens within the comment broke the rule of "replace a
multi-line comment with a single space" as could be exposed by code like the
following:

	#define FOO a/*
	*/b

	FOO

Prior to commit bd55ba568b301d0f764cd1ca015e84e1ae932c8b, this code defined
the macro FOO as "a b" as desired. Since that commit, this code instead
defines FOO as "a" and leaves a stray "b" in the output.

In this commit, we fix this by not emitting the NEWLINE tokens while lexing
the comment, but instead merely counting them in the commented_newlines
variable. Then, when the lexer next encounters a non-commented newline it
switches to a NEWLINE_CATCHUP state to emit as many NEWLINE tokens as
necessary (so that subsequent parsing stages still generate correct line
numbers).

Of course, it would have been more clear if we could have written a loop to
emit all the newlines, but flex conventions prevent that, (we must use
"return" for each token we emit).

It similarly would have been clear to have a new rule restricted to the
<NEWLINE_CATCHUP> state with an action much like the body of this if
condition. The problem with that is that this rule must not consume any
characters. It might be possible to write a rule that matches a single
lookahead of any character, but then we would also need an additional rule to
ensure for the <EOF> case where there are no additional characters available
for the lookahead to match.

Given those considerations, and given that the SKIP-state manipulation already
involves a code block at the top of the lexer function, before any rules, it
seems best to me to go with the implementation here which adds a similar
pre-rule code block for the NEWLINE_CATCHUP.
---
 src/glsl/glcpp/glcpp-lex.l                         | 38 +++++++++++++++++++---
 src/glsl/glcpp/glcpp-parse.y                       |  1 +
 src/glsl/glcpp/glcpp.h                             |  1 +
 src/glsl/glcpp/tests/063-comments.c.expected       |  4 +--
 .../094-divide-by-zero-short-circuit.c.expected    |  2 +-
 ...ation-and-non-continuation-backslash.c.expected |  2 +-
 src/glsl/glcpp/tests/118-comment-becomes-space.c   |  4 +++
 .../tests/118-comment-becomes-space.c.expected     |  5 +++
 8 files changed, 48 insertions(+), 9 deletions(-)
 create mode 100644 src/glsl/glcpp/tests/118-comment-becomes-space.c
 create mode 100644 src/glsl/glcpp/tests/118-comment-becomes-space.c.expected

diff --git a/src/glsl/glcpp/glcpp-lex.l b/src/glsl/glcpp/glcpp-lex.l
index 5543edc..3cb66d9 100644
--- a/src/glsl/glcpp/glcpp-lex.l
+++ b/src/glsl/glcpp/glcpp-lex.l
@@ -67,7 +67,7 @@ void glcpp_set_column (int  column_no , yyscan_t yyscanner);
 %option stack
 %option never-interactive
 
-%x DONE COMMENT UNREACHABLE SKIP DEFINE
+%x DONE COMMENT UNREACHABLE SKIP DEFINE NEWLINE_CATCHUP
 
 SPACE		[[:space:]]
 NONSPACE	[^[:space:]]
@@ -93,6 +93,25 @@ HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
 
 %%
 
+	glcpp_parser_t *parser = yyextra;
+
+	/* When we lex a multi-line comment, we replace it (as
+	 * specified) with a single space. But if the comment spanned
+	 * multiple lines, then subsequent parsing stages will not
+	 * count correct line numbers. To avoid this problem we keep
+	 * track of all newlines that were commented out by a
+	 * multi-line comment, and we emit a NEWLINE token for each at
+	 * the next legal opportunity, (which is when the lexer would
+	 * be emitting a NEWLINE token anyway).
+	 */
+	if (YY_START == NEWLINE_CATCHUP) {
+		if (parser->commented_newlines)
+			parser->commented_newlines--;
+		if (parser->commented_newlines == 0)
+			BEGIN INITIAL;
+		return NEWLINE;
+	}
+
 	/* The handling of the SKIP vs INITIAL start states requires
 	 * some special handling. Typically, a lexer would change
 	 * start states with statements like "BEGIN SKIP" within the
@@ -123,7 +142,6 @@ HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
 	 *	   expression so the parser can correctly update the
 	 *	   skip_stack state.
 	 */
-	glcpp_parser_t *parser = yyextra;
 	if (YY_START == INITIAL || YY_START == SKIP) {
 		if (parser->lexing_if ||
 		    parser->skip_stack == NULL ||
@@ -139,14 +157,16 @@ HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
 
 	/* Single-line comments */
 "//"[^\n]* {
+	if (parser->commented_newlines)
+		BEGIN NEWLINE_CATCHUP;
 }
 
 	/* Multi-line comments */
 "/*"                    { yy_push_state(COMMENT, yyscanner); }
 <COMMENT>[^*\n]*
-<COMMENT>[^*\n]*\n      { yylineno++; yycolumn = 0; return NEWLINE; }
+<COMMENT>[^*\n]*\n      { yylineno++; yycolumn = 0; parser->commented_newlines++; }
 <COMMENT>"*"+[^*/\n]*
-<COMMENT>"*"+[^*/\n]*\n { yylineno++; yycolumn = 0; return NEWLINE; }
+<COMMENT>"*"+[^*/\n]*\n { yylineno++; yycolumn = 0; parser->commented_newlines++; }
 <COMMENT>"*"+"/"        {
 	yy_pop_state(yyscanner);
 	if (yyextra->space_tokens)
@@ -162,6 +182,8 @@ HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
 	/* glcpp doesn't handle #extension, #version, or #pragma directives.
 	 * Simply pass them through to the main compiler's lexer/parser. */
 {HASH}(extension|pragma)[^\n]+ {
+	if (parser->commented_newlines)
+		BEGIN NEWLINE_CATCHUP;
 	yylval->str = ralloc_strdup (yyextra, yytext);
 	yylineno++;
 	yycolumn = 0;
@@ -208,7 +230,10 @@ HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
 }
 }
 
-<SKIP>[^\n] ;
+<SKIP>[^\n] {
+	if (parser->commented_newlines)
+		BEGIN NEWLINE_CATCHUP;
+}
 
 {HASH}error.* {
 	char *p;
@@ -323,6 +348,9 @@ HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
 }
 
 <SKIP,INITIAL>\n {
+	if (parser->commented_newlines) {
+		BEGIN NEWLINE_CATCHUP;
+	}
 	yyextra->lexing_if = 0;
 	yylineno++;
 	yycolumn = 0;
diff --git a/src/glsl/glcpp/glcpp-parse.y b/src/glsl/glcpp/glcpp-parse.y
index 7edc274..5474577 100644
--- a/src/glsl/glcpp/glcpp-parse.y
+++ b/src/glsl/glcpp/glcpp-parse.y
@@ -1164,6 +1164,7 @@ glcpp_parser_create (const struct gl_extensions *extensions, int api)
 	parser->newline_as_space = 0;
 	parser->in_control_line = 0;
 	parser->paren_count = 0;
+        parser->commented_newlines = 0;
 
 	parser->skip_stack = NULL;
 
diff --git a/src/glsl/glcpp/glcpp.h b/src/glsl/glcpp/glcpp.h
index 8aaa551..e95ab84 100644
--- a/src/glsl/glcpp/glcpp.h
+++ b/src/glsl/glcpp/glcpp.h
@@ -172,6 +172,7 @@ struct glcpp_parser {
 	int newline_as_space;
 	int in_control_line;
 	int paren_count;
+	int commented_newlines;
 	skip_node_t *skip_stack;
 	token_list_t *lex_from_list;
 	token_node_t *lex_from_node;
diff --git a/src/glsl/glcpp/tests/063-comments.c.expected b/src/glsl/glcpp/tests/063-comments.c.expected
index 73ca707..1965c9b 100644
--- a/src/glsl/glcpp/tests/063-comments.c.expected
+++ b/src/glsl/glcpp/tests/063-comments.c.expected
@@ -5,16 +5,16 @@ f = g /h;
  l();
 m = n
 + p;
+ 
 
 
 
 
 
 
- 
 more code here
-
  
+
 are not treated like comments.
  
  
diff --git a/src/glsl/glcpp/tests/094-divide-by-zero-short-circuit.c.expected b/src/glsl/glcpp/tests/094-divide-by-zero-short-circuit.c.expected
index 84fdc50..be20b7c 100644
--- a/src/glsl/glcpp/tests/094-divide-by-zero-short-circuit.c.expected
+++ b/src/glsl/glcpp/tests/094-divide-by-zero-short-circuit.c.expected
@@ -1,4 +1,5 @@
 0:12(17): preprocessor error: division by 0 in preprocessor directive
+ 
 
 
 
@@ -9,7 +10,6 @@
 
 
 
- 
 
 
 
diff --git a/src/glsl/glcpp/tests/117-line-continuation-and-non-continuation-backslash.c.expected b/src/glsl/glcpp/tests/117-line-continuation-and-non-continuation-backslash.c.expected
index 9b3eb67..292d651 100644
--- a/src/glsl/glcpp/tests/117-line-continuation-and-non-continuation-backslash.c.expected
+++ b/src/glsl/glcpp/tests/117-line-continuation-and-non-continuation-backslash.c.expected
@@ -1,3 +1,4 @@
+ 
 
 
 
@@ -6,7 +7,6 @@
 
 
 
- 
 
 
  
diff --git a/src/glsl/glcpp/tests/118-comment-becomes-space.c b/src/glsl/glcpp/tests/118-comment-becomes-space.c
new file mode 100644
index 0000000..53e8039
--- /dev/null
+++ b/src/glsl/glcpp/tests/118-comment-becomes-space.c
@@ -0,0 +1,4 @@
+#define FOO first/*
+*/second
+
+FOO
diff --git a/src/glsl/glcpp/tests/118-comment-becomes-space.c.expected b/src/glsl/glcpp/tests/118-comment-becomes-space.c.expected
new file mode 100644
index 0000000..2adf5d1
--- /dev/null
+++ b/src/glsl/glcpp/tests/118-comment-becomes-space.c.expected
@@ -0,0 +1,5 @@
+
+
+
+first second
+
-- 
1.8.5.2



More information about the mesa-dev mailing list