[Mesa-dev] [PATCH] RFC: glsl/glcpp: Allow for '#' characters to appear in shader body
Erik Faye-Lund
kusmabite at gmail.com
Thu Aug 7 07:41:17 PDT 2014
On Tue, Aug 5, 2014 at 11:22 PM, Carl Worth <cworth at cworth.org> wrote:
> Kenneth Graunke <kenneth at whitecape.org> writes:
>> I agree that this is pretty bogus.
>
> I'm coming around to thinking it's totally bogus.
>
>> How about emitting a warning in the RETURN_TOKEN ('#') case?
>
> Thanks for the review, and thanks for suggesting the warning. I added
> the warning, then decided it needed some additional testing. So I
> whipped up a test that uses every possible non-identifier character
> to separate a macro name from its definition, like so:
>
> #define S* splat
>
> (defining a macro "S" to expand to "* splat"). This is first patch
> attached below.
>
> With this test, there's the question of whether it's even legal to have
> no space between a macro name and the replacement list in a macro
> definition. As usual, the GLSL specification is mute on the question,
> (deferring just to "as is standard for C++"). If we choose GCC as the
> standard, it does accept this case, (and emits a "missing space"
> warning). By accident, glcpp has always silently accepted this, until
> recently barfing, but only when '#' was the separating character.
>
> My next patch expanded this new test so that '#' is tested as well, (and
> ensuring the warning about the known-bogus '#' character was
> emitted). This is the second patch attached below.
>
> The expanded test looks really odd to me. Even though there are
> something like two-dozen pieces of punctuation used a separators, only
> the case with '#' emits a warning.
>
> The insight is that the warning really has nothing to do with "#define
> FOO*" but everything to do with the '#' character appearing out of
> proper context. So what we really want is better testing of illegal
> characters in various contexts.
>
> So I wrote two additional new tests. The first runs every possible legal
> character through glcpp, ensuring they are all passed through correctly.
> It turns out this test caught a long-standing bug, (glcpp was tripping
> up on vertical tab and form-feed for which the GLSL specification is
> explicit should be allowed). The second test runs every possible illegal
> character through glcpp, ensuring that an error is generated for each.
>
> Instead of attaching those two tests, I'll send patches for them to the
> list. They're independent of the current discussion.
>
> And what was _really_ striking when looking at the final test (as I
> first wrote it, but not as I will submit it), is that every illegal
> character was generating an error message except for '#' which was just
> generating a warning. So that sealed it for me that this treatment is
> bogus.
>
> So I'm inclined to leave Mesa breaking Reaction Quake, (but I'll
> go file a bug against that application).
>
> Now, what we could do if we were so inclined, would be to defer the
> errors for illegal characters until they actually appeared in the
> pre-processor output. That, is, a line such as:
>
> $
>
> would be an immediate error, ("Illegal character '$'), but a line such
> as:
>
> #define DOLLAR $
>
> would be accepted with no error in and of itself, (the illegal character
> has appeared in a replacement list, but may not ever appear in the
> output to be seen by the compiler). Then, a subsequent line such as:
>
> DOLLAR
>
> would then emit the "Illegal character '$'" error.
>
> If we had all of that in place, that would un-break Reaction Quake in a
> way that wouldn't feel creepy to me, (and none of the tests would have
> odd-looking exceptional behavior).
What you describe here seems to actually be what the standard requires:
The relevant bit of the parse rule for a define statement of this type
is like so (Section 16 [cpp] of the C++ spec):
# define identifier replacement-list new-line
...And:
replacement-list:
pp-tokens(opt)
pp-tokens:
preprocessing-token
pp-tokens preprocessing-token
And Section 2.4 [lex.pptoken] defines:
preprocessing-token:
header-name
identifier
pp-number
character-literal
string-literal
preprocessing-op-or-punc
each non-white-space character that cannot be one of the above
So, from the preprocessor's point of view, '#' is allowed in a macro,
as it matches "preprocessing-op-or-punc" (or if we consider the #
operator removed from the cpp-spec due to lack of strings, it still
matches that "each non-white-space character that cannot be one of the
above"-rule). It's AFAICT not until such a pp-token gets converted to
a glsl-token that this might become an error. And (as you kind of
pointed out), if a macro is never used, it never gets converted.
Note that '$' is a bit different, as it's not a part of the
preprocessor's character set, so using it might be interpreted as
undefined behavior.
More information about the mesa-dev
mailing list