Adding Extension for Experimental Thai Spelling

Caolán McNamara caolanm at redhat.com
Mon Feb 13 07:08:20 PST 2012


On Sat, 2012-02-11 at 16:23 +0000, Richard Wordingham wrote:
> Is it possible to create an experimental alternative to the Thai
> break iterator that can be shared with other people as a LibreOffice
> extension?

I don't think we have any way to override our breakiterators from
extensions.

FWIW, i18npool/source/breakiterator is where we have our word,
character, sentence and line break iterators implemented. 

Typically we forward everything on to icu to do the real work, albeit
with some customization of the default icu rules.

What I'd *expect* to happen is that text marked as "Thai" should end up
getting broken into words by the default icu word break iterator, which
at http://userguide.icu-project.org/boundaryanalysis claims "ICU
provides a special dictionary-based break iterator."

So, assuming that nothing is simply broken, improving the icu Thai break
iterator should improve the libreoffice "for free".

I'd be sort of interested in confirming that what we have right now
actually works correctly, in the sense that Thai text definitely *is*
getting run through the special Thai-specific icu word break handler.

There is a i18npool/qa/cppunit/test_breakiterator.cxx which we use to
make sure that some existing edge-cases continue to work. If you wanted
to hack that to add some Thai word break tests that'd be helpful, and/or
simply pass me on some sample text where we *are* doing the right thing
and where we *aren't* and I could populate a test in there with that
data and turn the problem into a developer friendly "enable this
word-break unit test and make it work" problem.

C.



More information about the LibreOffice mailing list