[Mesa-dev] [PATCH 5/6] glsl/glcpp: Emit an error for any illegal GLSL character.

Carl Worth cworth at cworth.org
Tue Aug 5 16:33:06 PDT 2014


The GLSL Language Specification (version 4.30.6) is quite clear about the GLSL
character set and the expected behavior for other characters:

    Section 3.1 Character Set

    The source character set used for the OpenGL shading languages, outside of
    comments, is a subset of UTF-8. It includes the following characters:

        The letters a-z, A-Z, and the underscore ( _ ).

        The numbers 0-9.

        The symbols period (.), plus (+), dash (-), slash (/), asterisk (*),
        percent (%), angled brackets (< and >), square brackets ( [ and ] ),
        parentheses ( ( and ) ), braces ( { and } ), caret (^), vertical bar
        (|), ampersand (&), tilde (~), equals (=), exclamation point (!),
        colon (:), semicolon (;), comma (,), and question mark (?).

        The number sign (#) for preprocessor use.

        The backslash (\) as the line-continuation character when used as the
        last character of a line, just before a new line.

        White space: the space character, horizontal tab, vertical tab, form
        feed, carriage-return, and line-feed.

    A compile-time error will be given if any other character is used outside
    a comment.

By taking the set of all possible 8-bit characters, and subtracting the above,
we have the set of illegal characters:

    0x00 - 0x08 (^A - ^H)
    0x0E - 0x1F (^N - ^Z, ^[, ^\, ^], ^^, ^_)
    0x22 (")
    0x24 ($)
    0x27 (')
    0x40 (@)
    0x60 (')
    0x7F (DEL or ^?)
    0x80 - 0xFF (non-ASCII)

As well as (#) outside of uses defined by the preprocessor (not starting a
directive, nor as part of a legal paste operator in a replacement list), and
(\) appearing anywhere but at the end of a line.

So instead of the previous whitelist we had for "OTHER" characters, we know
add a blacklist for "ILLEGAL" characters based on the above, and then use a
simply regular expression of "." to catch any characters that get past the
blacklist.

This approach also means the internal-error rule with "." can no longer be
matched, so it goes away now.
---
 src/glsl/glcpp/glcpp-lex.l | 32 +++++++++++---------------------
 1 file changed, 11 insertions(+), 21 deletions(-)

diff --git a/src/glsl/glcpp/glcpp-lex.l b/src/glsl/glcpp/glcpp-lex.l
index 0dbdab0..790035c 100644
--- a/src/glsl/glcpp/glcpp-lex.l
+++ b/src/glsl/glcpp/glcpp-lex.l
@@ -175,15 +175,7 @@ HASH		#
 IDENTIFIER	[_a-zA-Z][_a-zA-Z0-9]*
 PP_NUMBER	[.]?[0-9]([._a-zA-Z0-9]|[eEpP][-+])*
 PUNCTUATION	[][(){}.&*~!/%<>^|;,=+-]
-
-/* The OTHER class is simply a catch-all for things that the CPP
-parser just doesn't care about. Since flex regular expressions that
-match longer strings take priority over those matching shorter
-strings, we have to be careful to avoid OTHER matching and hiding
-something that CPP does care about. So we simply exclude all
-characters that appear in any other expressions. */
-
-OTHER		[^][_#[:space:]#a-zA-Z0-9(){}.&*~!/%<>^|;,=+-]
+ILLEGAL		[\x00-\x08\x0E-\x1F"$'@`\x7F\x80-\xFF\\]
 
 DIGITS			[0-9][0-9]*
 DECIMAL_INTEGER		[1-9][0-9]*[uU]?
@@ -276,9 +268,10 @@ HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
          * token. */
 	if (parser->first_non_space_token_this_line) {
 		BEGIN HASH;
+		RETURN_TOKEN_NEVER_SKIP (HASH_TOKEN);
+	} else {
+		glcpp_error(yylloc, yyextra, "Illegal character '#' (not a preprocessing directive)");
 	}
-
-	RETURN_TOKEN_NEVER_SKIP (HASH_TOKEN);
 }
 
 <HASH>version{HSPACE}+ {
@@ -505,8 +498,8 @@ HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
 	RETURN_TOKEN (yytext[0]);
 }
 
-{OTHER}+ {
-	RETURN_STRING_TOKEN (OTHER);
+{ILLEGAL} {
+	glcpp_error(yylloc, yyextra, "Illegal character '%c'", yytext[0]);
 }
 
 {HSPACE} {
@@ -539,14 +532,7 @@ HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
 		RETURN_TOKEN (NEWLINE);
 }
 
-	/* This is a catch-all to avoid the annoying default flex action which
-	 * matches any character and prints it. If any input ever matches this
-	 * rule, then we have made a mistake above and need to fix one or more
-	 * of the preceding patterns to match that input. */
-
-<*>. {
-	glcpp_error(yylloc, yyextra, "Internal compiler error: Unexpected character: %s", yytext);
-
+<UNREACHABLE>. {
 	/* We don't actually use the UNREACHABLE start condition. We
 	only have this block here so that we can pretend to call some
 	generated functions, (to avoid "defined but not used"
@@ -557,6 +543,10 @@ HEXADECIMAL_INTEGER	0[xX][0-9a-fA-F]+[uU]?
 	}
 }
 
+<*>. {
+	RETURN_STRING_TOKEN (OTHER);
+}
+
 %%
 
 void
-- 
2.0.0



More information about the mesa-dev mailing list