Adding Extension for Experimental Thai Spelling

Richard Wordingham richard.wordingham at ntlworld.com
Thu Feb 16 15:24:02 PST 2012


On Tue, 14 Feb 2012 16:19:17 +0000
Caolán McNamara <caolanm at redhat.com> wrote:

> I think this change:
> http://cgit.freedesktop.org/libreoffice/core/commit/?id=475d0c59c66fb7752d230f76130b17145aad0c12
> should improve matters a lot.

It's a vast improvement - it gives LibreOffice a real Thai
spell-checker.  Thank you.  I have one worry for Siamese - Németh László
suggested that there might be a licensing issue back in
http://openoffice.2283327.n4.nabble.com/Thai-line-breaking-td2791315.html .

If there isn't such an issue, does this mean we can hope to see your
fix in LibreOffice 3.5.1?

> Makes "กุหลาบ" get treated as a single
> word in the unit test there now anyway, though the Northern Thai one
> is still not considered a single word, that might be due to the
> oldish icu we're still using.

I wouldn't expect a dictionary-based line breaker to handle words from
other languages.  (There's a whole slew of Mon-Khmer languages in
Thailand, and they mostly use the Thai script when they happen to get
written.)  I can work my way round the problem using the sticking
plaster of ZWSP and WJ (no-break no-space), and I think some use of
them or an equivalent is inevitable when the sequence of visible
characters doesn't define the breaks.  In particular, after gluing
กุ๊หลาบ together with WJ, Hunspell offered me กุหลาบ as a correction,
which is good.

There may be some rough edges with ZWSP and WJ going into the
dictionary (TBC), but what you've done will justify LibreOffice claiming
a Thai spell checking capability.

Minority language support may not be compatible with libthai - at least
one language uses a combining underline, and some of the mark
combinations used for minority languages would get rejected by the WTT
rules that libthai supports.

Richard.


More information about the LibreOffice mailing list