Adding Extension for Experimental Thai Spelling

Caolán McNamara caolanm at redhat.com
Tue Feb 14 08:19:17 PST 2012


On Mon, 2012-02-13 at 22:39 +0000, Richard Wordingham wrote:
> The spell-checker seems to break up a phrase consisting of just กุหลาบ
> into 3 or 4 words.

Hmm, so I played around with this and here's what I think is the
problem...

We have some customized break iterator rules in LibreOffice, so we're
using those ones and *not* the built-in icu ones. But we lack a
customized Thai one, so we're using some ultra-generic word breaking
stuff for Thai and not going near the special built-into-icu Thai
iterator :-(

I think this change:
http://cgit.freedesktop.org/libreoffice/core/commit/?id=475d0c59c66fb7752d230f76130b17145aad0c12
should improve matters a lot. Makes "กุหลาบ" get treated as a single
word in the unit test there now anyway, though the Northern Thai one is
still not considered a single word, that might be due to the oldish icu
we're still using.

After some googling I'm unsure if the "right way to go" to further
improve Thai break iterators is to simply have another go at upgrading
icu to get the latest and greatest there, or for "someone" to have a go
at integrating libthai into LibreOffice and hand off break iteration for
Thai to that. Either way, link above and related unit test give an entry
point to the relevant code.

C. 



More information about the LibreOffice mailing list