[HarfBuzz] Language Modularization?

Ed Trager ed.trager at gmail.com
Thu Nov 6 08:12:07 PST 2008


Hi, Theppitak,

Have you and your friends ever thought about writing a new
*extensible* word segmentation system to replace libThai that would
handle not only Thai, but also Lao, Khmer, Burmese and eventually even
other orthographies of Southeast Asia such as คำเมือง ?

Ideally, such a system would itself allow for "pluggable" methods, and
would be fully based on Unicode.  So if someone invents a
better/faster/smaller/more accurate algorithm for Thai segmentation,
they could just wrap their algorithm in a class that would just plug
in to such a system.

Such a system would also provide standard containers for the
dictionaries needed for segmentation of Thai, Khmer, and others.

What do you and others think?

Would there be interest in organizing a conference to examine these
issues and work collaboratively to provide a unified solution?

-- Ed Trager

On Thu, Nov 6, 2008 at 9:49 AM, Theppitak Karoonboonyanan
<thep at linux.thai.net> wrote:
> Hi,
>
> Some of my KDE friends are working on Thai support in Harfbuzz.
> However, you may know that it's quite expensive to get proper
> Thai word break support. It requires loading libthai and a whole
> dictionary data into memory.
>
> With Pango modularization, this is mitigated for non-Thai users by
> means of dynamic plug-ins. It's never loaded as long as no Thai
> text is processed.
>
> But for current Harfbuzz, this mitigation is achieved by dlopen-ing
> libthai.so.0, which may or may not be available in the system.
> It's kind of loose dependency which can easily be missed by
> automatic packaging systems like shlibs tracking.
>
> So, I wonder which direction Harfbuzz would go. Is dynamic
> module in the plan?
>
> Thanks,
> --
> Theppitak Karoonboonyanan
> http://linux.thai.net/~thep/
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
>


More information about the HarfBuzz mailing list