[HarfBuzz] Language Modularization?

Theppitak Karoonboonyanan thep at linux.thai.net
Fri Nov 7 02:25:23 PST 2008


On Fri, Nov 7, 2008 at 3:53 PM, Jens Herden <jens at khmeros.info> wrote:
> On Freitag 07 November 2008, Theppitak Karoonboonyanan wrote:
>> It seems only Thai needs dictionary-based algorithm. Others don't.
>
> AFAIK this is not correct.

Hmm.. But that's summarized from what I've been told in regional
conferences, with report papers.

Probably, the advantage of Indic encoding scheme has been
over-focused when talking to a Thai guy like me.. ;-)

>> - Lao, the closest implementation to Thai, has simplified its writing
>> system to be phonetic-based. Word break can be achieved solely by syllabic
>> rules.
>>
>> - Other scripts, including Myanmar and Khmer, have adopted Indic
>>   encoding scheme, which already has intrinsic information on syllable
>>   boundaries. So, word break can also be achieved by rule-based
>>   approach. (Confirmed for Myanmar, at least.)
>
> While it is easy to find the syllable breaks in Khmer it is not easy to find
> the word breaks, because many words are made by more than one syllable.
> You need a dictionary based approach for Khmer for good word breaking though.

Does this include line wrapping? Is wrapping lines at syllable
boundaries OK for Khmer? (I've been told it's acceptable for other
languages.)

-- 
Theppitak Karoonboonyanan
http://linux.thai.net/~thep/



More information about the HarfBuzz mailing list