[Libreoffice] [Crazy Ideas] Discuss "Replace regexp parser with std library"

Thorsten Behrens thb at documentfoundation.org
Mon Nov 29 05:22:55 PST 2010


Joe Smith wrote:
> I've looked at the code a bit, and it seems like there is indeed only one point
> of contact with the rest of the suite, textsearch.cxx, which handles all types
> of text searches (normal, regexp & fuzzy), and calls Regexpr::re_search(), which
> calls re_match2() to run the actual regexp match.
> 
> So the structure makes it easy to replace the regexp code in one place.
> 
> Unfortunately, the way the functions work does not match well with the Boost RE
> classes, although I'm sure it would be possible with an interface layer.
> 
> For example, the Boost engine handles locale-specific issues internally, whereas
> OOo's engine knows almost nothing about character case or multi-character
> sequences. Instead, it preps the text to be searched by running it through a
> filter. I don't understand the i18n & character encoding issues well enough to
> guess what that filter is actually doing or how it should be handled.
> 
Hi Joe,

hm - then I think a combination of those two approaches might be a
winning strategy - LibO uses icu for all those nifty transliteration
stuff & what not.

I notice that newer boost versions also optionally support icu,
maybe that already gives us good enough coverage - I'd be tempted to
just give it a whirl, and add it as an optional, experimental
feature to have people play with it.

Cheers,

-- Thorsten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/libreoffice/attachments/20101129/65abf4e5/attachment.pgp>


More information about the LibreOffice mailing list