[Mesa-dev] [PATCH 5/6] glsl/glcpp: Emit an error for any illegal GLSL character.
Carl Worth
cworth at cworth.org
Tue Aug 5 16:33:06 PDT 2014
The GLSL Language Specification (version 4.30.6) is quite clear about the GLSL
character set and the expected behavior for other characters:
Section 3.1 Character Set
The source character set used for the OpenGL shading languages, outside of
comments, is a subset of UTF-8. It includes the following characters:
The letters a-z, A-Z, and the underscore ( _ ).
The numbers 0-9.
The symbols period (.), plus (+), dash (-), slash (/), asterisk (*),
percent (%), angled brackets (< and >), square brackets ( [ and ] ),
parentheses ( ( and ) ), braces ( { and } ), caret (^), vertical bar
(|), ampersand (&), tilde (~), equals (=), exclamation point (!),
colon (:), semicolon (;), comma (,), and question mark (?).
The number sign (#) for preprocessor use.
The backslash (\) as the line-continuation character when used as the
last character of a line, just before a new line.
White space: the space character, horizontal tab, vertical tab, form
feed, carriage-return, and line-feed.
A compile-time error will be given if any other character is used outside
a comment.
By taking the set of all possible 8-bit characters, and subtracting the above,
we have the set of illegal characters:
0x00 - 0x08 (^A - ^H)
0x0E - 0x1F (^N - ^Z, ^[, ^\, ^], ^^, ^_)
0x22 (")
0x24 ($)
0x27 (')
0x40 (@)
0x60 (')
0x7F (DEL or ^?)
0x80 - 0xFF (non-ASCII)
As well as (#) outside of uses defined by the preprocessor (not starting a
directive, nor as part of a legal paste operator in a replacement list), and
(\) appearing anywhere but at the end of a line.
So instead of the previous whitelist we had for "OTHER" characters, we know
add a blacklist for "ILLEGAL" characters based on the above, and then use a
simply regular expression of "." to catch any characters that get past the
blacklist.
This approach also means the internal-error rule with "." can no longer be
matched, so it goes away now.
---
src/glsl/glcpp/glcpp-lex.l | 32 +++++++++++---------------------
1 file changed, 11 insertions(+), 21 deletions(-)
diff --git a/src/glsl/glcpp/glcpp-lex.l b/src/glsl/glcpp/glcpp-lex.l
index 0dbdab0..790035c 100644
--- a/src/glsl/glcpp/glcpp-lex.l
+++ b/src/glsl/glcpp/glcpp-lex.l
@@ -175,15 +175,7 @@ HASH #
IDENTIFIER [_a-zA-Z][_a-zA-Z0-9]*
PP_NUMBER [.]?[0-9]([._a-zA-Z0-9]|[eEpP][-+])*
PUNCTUATION [][(){}.&*~!/%<>^|;,=+-]
-
-/* The OTHER class is simply a catch-all for things that the CPP
-parser just doesn't care about. Since flex regular expressions that
-match longer strings take priority over those matching shorter
-strings, we have to be careful to avoid OTHER matching and hiding
-something that CPP does care about. So we simply exclude all
-characters that appear in any other expressions. */
-
-OTHER [^][_#[:space:]#a-zA-Z0-9(){}.&*~!/%<>^|;,=+-]
+ILLEGAL [\x00-\x08\x0E-\x1F"$'@`\x7F\x80-\xFF\\]
DIGITS [0-9][0-9]*
DECIMAL_INTEGER [1-9][0-9]*[uU]?
@@ -276,9 +268,10 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]?
* token. */
if (parser->first_non_space_token_this_line) {
BEGIN HASH;
+ RETURN_TOKEN_NEVER_SKIP (HASH_TOKEN);
+ } else {
+ glcpp_error(yylloc, yyextra, "Illegal character '#' (not a preprocessing directive)");
}
-
- RETURN_TOKEN_NEVER_SKIP (HASH_TOKEN);
}
<HASH>version{HSPACE}+ {
@@ -505,8 +498,8 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]?
RETURN_TOKEN (yytext[0]);
}
-{OTHER}+ {
- RETURN_STRING_TOKEN (OTHER);
+{ILLEGAL} {
+ glcpp_error(yylloc, yyextra, "Illegal character '%c'", yytext[0]);
}
{HSPACE} {
@@ -539,14 +532,7 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]?
RETURN_TOKEN (NEWLINE);
}
- /* This is a catch-all to avoid the annoying default flex action which
- * matches any character and prints it. If any input ever matches this
- * rule, then we have made a mistake above and need to fix one or more
- * of the preceding patterns to match that input. */
-
-<*>. {
- glcpp_error(yylloc, yyextra, "Internal compiler error: Unexpected character: %s", yytext);
-
+<UNREACHABLE>. {
/* We don't actually use the UNREACHABLE start condition. We
only have this block here so that we can pretend to call some
generated functions, (to avoid "defined but not used"
@@ -557,6 +543,10 @@ HEXADECIMAL_INTEGER 0[xX][0-9a-fA-F]+[uU]?
}
}
+<*>. {
+ RETURN_STRING_TOKEN (OTHER);
+}
+
%%
void
--
2.0.0
More information about the mesa-dev
mailing list