[Libreoffice] [Crazy Ideas] Discuss "Replace regexp parser with std library"
Thorsten Behrens
thb at documentfoundation.org
Mon Nov 29 05:22:55 PST 2010
Joe Smith wrote:
> I've looked at the code a bit, and it seems like there is indeed only one point
> of contact with the rest of the suite, textsearch.cxx, which handles all types
> of text searches (normal, regexp & fuzzy), and calls Regexpr::re_search(), which
> calls re_match2() to run the actual regexp match.
>
> So the structure makes it easy to replace the regexp code in one place.
>
> Unfortunately, the way the functions work does not match well with the Boost RE
> classes, although I'm sure it would be possible with an interface layer.
>
> For example, the Boost engine handles locale-specific issues internally, whereas
> OOo's engine knows almost nothing about character case or multi-character
> sequences. Instead, it preps the text to be searched by running it through a
> filter. I don't understand the i18n & character encoding issues well enough to
> guess what that filter is actually doing or how it should be handled.
>
Hi Joe,
hm - then I think a combination of those two approaches might be a
winning strategy - LibO uses icu for all those nifty transliteration
stuff & what not.
I notice that newer boost versions also optionally support icu,
maybe that already gives us good enough coverage - I'd be tempted to
just give it a whirl, and add it as an optional, experimental
feature to have people play with it.
Cheers,
-- Thorsten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20101129/65abf4e5/attachment.pgp>
More information about the LibreOffice
mailing list