[HarfBuzz] Language Modularization?
Ed Trager
ed.trager at gmail.com
Thu Nov 6 08:12:07 PST 2008
Hi, Theppitak,
Have you and your friends ever thought about writing a new
*extensible* word segmentation system to replace libThai that would
handle not only Thai, but also Lao, Khmer, Burmese and eventually even
other orthographies of Southeast Asia such as คำเมือง ?
Ideally, such a system would itself allow for "pluggable" methods, and
would be fully based on Unicode. So if someone invents a
better/faster/smaller/more accurate algorithm for Thai segmentation,
they could just wrap their algorithm in a class that would just plug
in to such a system.
Such a system would also provide standard containers for the
dictionaries needed for segmentation of Thai, Khmer, and others.
What do you and others think?
Would there be interest in organizing a conference to examine these
issues and work collaboratively to provide a unified solution?
-- Ed Trager
On Thu, Nov 6, 2008 at 9:49 AM, Theppitak Karoonboonyanan
<thep at linux.thai.net> wrote:
> Hi,
>
> Some of my KDE friends are working on Thai support in Harfbuzz.
> However, you may know that it's quite expensive to get proper
> Thai word break support. It requires loading libthai and a whole
> dictionary data into memory.
>
> With Pango modularization, this is mitigated for non-Thai users by
> means of dynamic plug-ins. It's never loaded as long as no Thai
> text is processed.
>
> But for current Harfbuzz, this mitigation is achieved by dlopen-ing
> libthai.so.0, which may or may not be available in the system.
> It's kind of loose dependency which can easily be missed by
> automatic packaging systems like shlibs tracking.
>
> So, I wonder which direction Harfbuzz would go. Is dynamic
> module in the plan?
>
> Thanks,
> --
> Theppitak Karoonboonyanan
> http://linux.thai.net/~thep/
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
>
More information about the HarfBuzz
mailing list