Adding Extension for Experimental Thai Spelling

Richard Wordingham richard.wordingham at
Mon Feb 13 14:39:34 PST 2012

Thank you to every one who's offered me advice.

On Mon, 13 Feb 2012 15:08:20 +0000
Caolán McNamara <caolanm at> wrote:

> I don't think we have any way to override our breakiterators from
> extensions.

Ah well, I'll just have to try to get Thai spell-checking working for
myself and then worry about sharing my changes - assuming I succeed.

> I'd be sort of interested in confirming that what we have right now
> actually works correctly, in the sense that Thai text definitely *is*
> getting run through the special Thai-specific icu word break handler.

It's definitely going through a Siamese-specific word-breaker for
line-breaking.  For example the two-syllable Thai word กุหลาบ
'rose' moves to the next line, but when I convert it to the Northern
Thai form กุ๊หลาบ (not the spelling I'd favour) by adding a
(non-spacing) tone mark, it's promptly broken between lines along the
syllable boundary, although the first syllable does not constitute a
word, at least not one recorded in the Royal Institute Dictionary. I'm
glad to find that inserting U+2060 WJ prevents that break. The
spell-checker seems to break up a phrase consisting of just กุหลาบ into 3 or 4 words.


