[Mesa-dev] [PATCH] RFC: glsl/glcpp: Allow for '#' characters to appear in shader body

Erik Faye-Lund kusmabite at gmail.com
Thu Aug 7 07:41:17 PDT 2014


On Tue, Aug 5, 2014 at 11:22 PM, Carl Worth <cworth at cworth.org> wrote:
> Kenneth Graunke <kenneth at whitecape.org> writes:
>> I agree that this is pretty bogus.
>
> I'm coming around to thinking it's totally bogus.
>
>> How about emitting a warning in the RETURN_TOKEN ('#') case?
>
> Thanks for the review, and thanks for suggesting the warning. I added
> the warning, then decided it needed some additional testing. So I
> whipped up a test that uses every possible non-identifier character
> to separate a macro name from its definition, like so:
>
>         #define S* splat
>
> (defining a macro "S" to expand to "* splat"). This is first patch
> attached below.
>
> With this test, there's the question of whether it's even legal to have
> no space between a macro name and the replacement list in a macro
> definition. As usual, the GLSL specification is mute on the question,
> (deferring just to "as is standard for C++"). If we choose GCC as the
> standard, it does accept this case, (and emits a "missing space"
> warning). By accident, glcpp has always silently accepted this, until
> recently barfing, but only when '#' was the separating character.
>
> My next patch expanded this new test so that '#' is tested as well, (and
> ensuring the warning about the known-bogus '#' character was
> emitted). This is the second patch attached below.
>
> The expanded test looks really odd to me. Even though there are
> something like two-dozen pieces of punctuation used a separators, only
> the case with '#' emits a warning.
>
> The insight is that the warning really has nothing to do with "#define
> FOO*" but everything to do with the '#' character appearing out of
> proper context. So what we really want is better testing of illegal
> characters in various contexts.
>
> So I wrote two additional new tests. The first runs every possible legal
> character through glcpp, ensuring they are all passed through correctly.
> It turns out this test caught a long-standing bug, (glcpp was tripping
> up on vertical tab and form-feed for which the GLSL specification is
> explicit should be allowed). The second test runs every possible illegal
> character through glcpp, ensuring that an error is generated for each.
>
> Instead of attaching those two tests, I'll send patches for them to the
> list. They're independent of the current discussion.
>
> And what was _really_ striking when looking at the final test (as I
> first wrote it, but not as I will submit it), is that every illegal
> character was generating an error message except for '#' which was just
> generating a warning. So that sealed it for me that this treatment is
> bogus.
>
> So I'm inclined to leave Mesa breaking Reaction Quake, (but I'll
> go file a bug against that application).
>
> Now, what we could do if we were so inclined, would be to defer the
> errors for illegal characters until they actually appeared in the
> pre-processor output. That, is, a line such as:
>
>         $
>
> would be an immediate error, ("Illegal character '$'), but a line such
> as:
>
>         #define DOLLAR $
>
> would be accepted with no error in and of itself, (the illegal character
> has appeared in a replacement list, but may not ever appear in the
> output to be seen by the compiler). Then, a subsequent line such as:
>
>         DOLLAR
>
> would then emit the "Illegal character '$'" error.
>
> If we had all of that in place, that would un-break Reaction Quake in a
> way that wouldn't feel creepy to me, (and none of the tests would have
> odd-looking exceptional behavior).

What you describe here seems to actually be what the standard requires:

The relevant bit of the parse rule for a define statement of this type
is like so (Section 16 [cpp] of the C++ spec):

# define identifier replacement-list new-line

...And:

replacement-list:
    pp-tokens(opt)

pp-tokens:
    preprocessing-token
    pp-tokens preprocessing-token

And Section 2.4 [lex.pptoken] defines:

preprocessing-token:
    header-name
    identifier
    pp-number
    character-literal
    string-literal
    preprocessing-op-or-punc
    each non-white-space character that cannot be one of the above

So, from the preprocessor's point of view, '#' is allowed in a macro,
as it matches "preprocessing-op-or-punc" (or if we consider the #
operator removed from the cpp-spec due to lack of strings, it still
matches that "each non-white-space character that cannot be one of the
above"-rule). It's AFAICT not until such a pp-token gets converted to
a glsl-token that this might become an error. And (as you kind of
pointed out), if a macro is never used, it never gets converted.

Note that '$' is a bit different, as it's not a part of the
preprocessor's character set, so using it might be interpreted as
undefined behavior.


More information about the mesa-dev mailing list