[HarfBuzz] Language Modularization?

Theppitak Karoonboonyanan thep at linux.thai.net
Fri Nov 7 18:45:00 PST 2008


Hi, Javier,

Thanks for the information. So, this has changed my view to the
problem. Although I could imagine the requirement of something
more than syllable boundaries by inferring from my own language,
I had to respect what native people told me (in the past).

For Lao, as it's very close to Thai, and with my knowledge on the
language, wrapping lines at syllable boundaries should not be
acceptable for me as a Thai reader. However, as Lao writing
system has been reformed based on phonetics, I can't assume
that from Thai's point of view.

In a sample book for Lao students I have, titled "ຊອບໃຈທີ່ຫລຽວເບິ່ງ"
(Thanks for glancing), lines appear to be wrapped at syllable
boundaries, rather than word boundaries. (For example, "ເອກະ|ລັກ",
"ສືບ|ທອດ", "ອະທິ|ບາຍ", "ປະຊາ|ຊົນ", "ປະດິດສະ|ຖານ", etc. where "|" is
the wrapping position. This can be seen all over the book.)

Other Lao books I have are poems. So, that's the only reference
I have, which may be not sufficient for conclusion.

So, I Cc: this to Anousak to ask for sure, whether Lao typography
wraps lines at word boundaries or syllable boundaries.

In fact, Anousak and I have talked for a while about implementing
Lao word breaking. But I've assumed all the time that rule-based
syllable breaking is sufficient. I can be proven wrong again this time.

Regards,
-- 
Theppitak Karoonboonyanan
http://linux.thai.net/~thep/

On Sat, Nov 8, 2008 at 7:51 AM, Javier SOLA <javier at khmeros.info> wrote:
>
> Hi Tep, Ed,
>
> I am not subscribed to the harfbuzz llist, so this message will not make it
> there.
>
> I can confirm that syllable line-breaking is not correct for either Khmer or
> languages written with Myanmar script, including Burmese.
>
> Syllable breaking is done in Burmese in newspapers, with very thin columns,
> but is not desirable. In Khmer it is not acceptable. I don't know about Lao,
> but I assume that it would always be preferable to do breaks in words, and
> not syllables. Again, newspaper practices do not indicate good script usage,
> but their own constraints.
>
> We have been testing (with Jens Herden, in cc) dictionary-based line
> breaking for Khmer in ICU, copying the algorithm that is there for Thai. We
> will be integrating it in mainstream  OpenOffice as soon as possible
> (OpenOffice is now upgrading to ICU 4.0, which makes it much easier).
>
> For Burmese is more complex, as graphemes and syllables are different for
> them (in many cases one syllable spans two graphemes). UNICODE for Myanmar
> is not yet final (character order), so it is still difficult to do any work
> in this front). Final order needs to take into account several minority
> languages (Sgaw Karen, Shan, Mon, etc.), and it is not easy. A new proposal
> is being prepared.
>
> It is important to understand that so far (for Thai), line-breaking and
> word-boundaries are broken together (same places). The result of the
> line-breaking is used by spell-checkers. Using syllable breaks divides the
> words in pieces and breaks spell-checking, while dictionary based does
> correct spell-checking (we have already tested in OpenOffice).
>
> Regards,
>
> Javier
>


More information about the HarfBuzz mailing list