[HarfBuzz] Indic expertise wanted

Ed Trager ed.trager at gmail.com
Tue Jun 14 10:22:20 PDT 2011


On Tue, Jun 14, 2011 at 12:09 PM, pravin.d.s at gmail.com
<pravin.d.s at gmail.com> wrote:
>
>
> On 14 June 2011 08:11, Kenichi Handa <handa at m17n.org> wrote:
>>
>> In article <4DF6C38F.30403 at gmail.com>, Shriramana Sharma
>> <samjnaa at gmail.com> writes:
>>
>> > On 14-06-2011 00:18, Behdad Esfahbod wrote:
>> > > I know that problem very well, and am working on a solution to address
>> > > it.
>> > > Having more shapers doesn't really solve it though.
>>
>> > Yeah, I was meaning to say this: Each Indic script has its own unique
>> > characteristics and so even the classification of North Indic vs South
>> > Indic wouldn't work.
>>
>> FYI, m17n-lib's approach is to have a layouting engine for
>> each script.  In the case of m17n-lib, having many layouting
>> engines has no problem.  Each engine is just 100 to 300
>> lines of text file containing layouting rules.  We adopted
>> this approach because we found that a slight difference of
>> layouting rules results in rather complicated code when they
>> are mixed in a signle engine.
>
> Yeah, pango and even old harfbuzz code was problematic from fixing point of
> view due to this problem and behdad as a upstream developer know this very
> well :)
>
> since this time we are starting from scratch, it will be better to make
> individual engine and make them perfect. In this way we will know exceptions
> of each language/script better. And then we will be in better position to
> merge them back.
>
> Advantage of having single engine for each script for now is we can
> concurrently work on most of the language.
>

When I do various software projects, I always like to think in terms
of "phase one", "phase two" and "phase three" -- and I like to "sell"
projects to stakeholders in these terms.

So, based on what I am hearing here, perhaps we can summarize the
proposed development process as follows:

PHASE ONE:

(1)  Several dedicated and knowledgeable individuals work closely with
Behdad to establish a "template" Indic shaper that will more-or-less
define what all of the individual Indic shapers should look like in
terms of structure and how they interface with the rest of HarfBuzz
and what the state table(s) are supposed to look like.

Maybe this "template" shaper is an actual shaper for a well-understood
case like modern Devanagari usage for Hindi.  All the special cases
like vedic extensions or whatever can solved / added later in a
separate fork so that the "template" retains code clarity and didactic
value.

(2) Based on the "template" example, all the specific Indic teams can
then write individual shapers.

(3) HarfBuzz "phase one" then incorporates all of the individual shapers.

(4) At this phase, the Indic team members focus on correct rendering
so that all of the individual "phase one" shapers can later serve as
the reference implementations against which any later-phase
implementations can be compared when running regression tests.

PHASE TWO:

In phase two, the combined HarfBuzz and Indic team can start looking
at possibly merging shapers where it makes the most sense.  There
would be no need to "force" a single "north" shaper versus a single
"south" shaper.  Maybe you end up with a reduction in the number of
"north" vs. "south" vs. "other" shapers -- or maybe you don't.  It
will all depend on a more natural evolutionary process where the
tradeoffs between maintainability, comprehensibility, and code size
and structure are optimized over time.

PHASE THREE:

Don't worry about phase three yet.  Having a concrete plan for phase
one with a clear vision that will take you up to phase two is enough!

Just my attempt to clarify what may already be in many people's minds
already ...

- Ed

> Behdad in pango we have used state table for identifying invalid syllable,
> are you thinking same in harfbuzz. Let me know i can work on making state
> table for script.
> Dunno how can we make single state table for all script due to exception in
> some language, like Bengali allow combination of Vowel U+0985 with matra
> U+09BE. Need to check.
>
> Regards,
> Pravin S
>
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
>
>



More information about the HarfBuzz mailing list