[XESAM] Aiming for RC1 this sunday. Missing pieces...

Shaun McCance shaunm at gnome.org
Mon Sep 17 09:58:35 PDT 2007


On Sat, 2007-09-15 at 13:01 +0200, Mikkel Kamstrup Erlandsen wrote:
> 2007/9/14, Shaun McCance <shaunm at gnome.org>:
>         On Fri, 2007-09-14 at 01:01 +0200, Mikkel Kamstrup Erlandsen
>         wrote:
>         > Man I love speaking to myself :-) Anyways, I just completed
>         (4). Check
>         > it out at
>         http://wiki.freedesktop.org/wiki/XesamQueryLanguage and
>         > gimme the flames...
>         
>         (I'm not involved in any way with any project that would
>         be implementing this.  I just like reading specifications.)
>         
>         Regarding the regExp extension, should the exact regular
>         expression syntax be specified?  (Pointing to an existing
>         implementation of regexps would be sufficient.)  I'm just
>         imagining a scenario where a regexp doesn't match what it 
>         should, because the client and the engine don't agree on
>         what means what.
> 
> I've been digging about and it seems the Extended Regular Expressions
> or Perl Regexps are the only real candidates. Basic Regexp seems to
> simple. 
> 
> I will probably choose extended unless somebody yells.

I'm pretty sure that extended regexps are sufficient for
most things.  However, a lot of software just uses pcre.
For instance, here's the documentation on the regexp
syntax in GLib:

http://library.gnome.org/devel/glib/stable/glib-regex-syntax.html

Perl regexps are (I think) a superset of extended regexps,
so it's generally possible to hand an extended regexp to
pcre.  But the question is, do there exist extended regexps
that would be interpreted differently by pcre?  Basically,
that boils down to there being valid literal-text matches
that pcre would instead treat as a pattern.

It does seem that Perl extensions use a syntax that is
designed to prevent such conflicts.  For example:

- Back references look like "\1".  The backslash is
  defined to be an escape character in regexps.
- Assertions start with "(?".  Putting a quantifier
  after an unescaped paren is nonsense otherwise.

It might be worthwhile to go to a regexp/pcre expert
and ask, "Is it safe to assume any extended regexp
passed to pcre will return the same result as if it
were passed to an extended regexp engine?"

If the answer is "yes" (or "yes, except for some
corner cases that are sufficiently irrelevant"),
then Xesam could just state "some engines might
actually use pcre, but you're only allowed to
assume extended".  (And maybe it's worth putting
something into vendor.extensions whenever Perl
regexps are supported.  Or maybe not.)

--
Shaun




More information about the xdg mailing list