Adding Extension for Experimental Thai Spelling

Richard Wordingham richard.wordingham at
Fri Feb 17 14:50:05 PST 2012

On Fri, 17 Feb 2012 14:10:21 +0000
Caolán McNamara <caolanm at> wrote:

> On Thu, 2012-02-16 at 23:24 +0000, Richard Wordingham wrote:
> Indeed, yeah, I suppose, assuming its as complicated as "Thai", that
> the right direction would be for someone to write for icu new
> dictionary-based breakiterators for the "nod"(?) language and then the
> rather trivial changes to LibreOffice to know about the language in
> order to mark text as that language to bubble that info down to icu

Northern Thai's not quite as simple or standardised as Siamese!  One can
meet (at least) the following spelling systems:

1) Chiangmai phonetics
2) Chiangrai phonetics (different mapping of tones to Siamese spelling
3) Transliteration from Tai Tham script (probably rare for connected
4) Tai Tham script

However, perhaps dictionary-based break iterators are something to be
treated like dictionaries.  There are several other writing systems
that could probably benefit from them:

Thai script:
  Northern Thai
  NE Thai (for recording songs - use of Siamese tone rules scrambles
  the tonemarks compared to Siamese cognates)

Khmer script:
  Khmer - there's already a project for this set up on SourceForge.

Tai Tham script:
  Tai Khuen
  Tai Lue

Lao script

Tibetan script

I've a feeling Burmese may also have a need for dictionary based text
breaking, though it's better behaved for syllable breaking than most of
the others listed here.  Shan would come in the same category.

The above list is not exhaustive.  Tai Lue in Lao script probably
belongs in the list.

Not all Thai script writing systems need a break iterator - some of the
minority languages separate words with spaces, but that's partially a
matter of literacy - Thais start writing Thai with interword gaps and
then learn to suppress the gaps.  Pali written in Thai also separates
words with spaces - but Pali has some very long words!


More information about the LibreOffice mailing list