[HarfBuzz] use of the 'cluster' field

arjuna rao chavala arjunaraoc at googlemail.com
Fri Jun 4 03:18:36 PDT 2010

I would like to  confirm my plan to address a bug with respect to indic
language rendering, as it is also related to  'cluster'. In indic language,
the  GSUB  lookups need to be performed on the character cluster. In the
present code, the GSUB lookups are happening for the entire glyph sequence
causing   bugs in Telugu(confirmed) and Kananda (most likely). I have fixed
the code  to do the  same (available at
https://bugzilla.gnome.org/show_bug.cgi?id=579398). As this caused problems
with Firefox rendering, I have kept the fix pending.

I would like to know whether there is any better solution for the same. The
requirement is to  limit the look up buffer length, based on character
cluster, as determined during parsing.

For example in Telugu,   ka+matra  'a'+ sha + halanth+ space  (original
typing  order)  the  first two belong to one cluster and the last two
another cluster. GSUB should be checked for the first two as a unit and the
second two as a unit.  Presently  GSUB is being applied for the  entire
glyph sequence.  Each character is becoming an independent cluster. It
should be applied only  when  based on the parsing,   ka+sha+halanth or
ka+sha+halanth+halanth  is  determined as a cluster  (after reordering rules
are applied)  without any other characters   in between.  Is it possible to
achieve at layout level without language specific code, utilizing the
current data structures (eg:  gproperties)?

Note: Halanth is used as joiner between two consonants  to form conjugate
consonants in indic languages.


2010/6/4 Behdad Esfahbod <behdad at behdad.org>

> Hi Jonathan,
> All of those are planned as per discussion in Reading.  It make take a week
> or
> more before I get to implementing them though, since it involves quite some
> shuffling.  How does your timeline for these look like?
> behdad
> On 06/03/2010 09:42 AM, Jonathan Kew wrote:
> > Hi Behdad,
> >
> > As we discussed a bit in Reading, I'd like the handling of the 'cluster'
> field to be modified so that combining marks retain their original 'cluster'
> values, unless of course they get ligated with the base or otherwise
> processed. This will better preserve the association between glyphs and the
> original text. (We need this in order to identify glyphs such as CGJ in the
> final buffer.)
> >
> > To do this, I think it's necessary to change hb_form_clusters into
> something like hb_mark_clusters, and have it set a flag in gproperties for
> the mark glyphs instead of actually changing the cluster field; then
> hb_buffer_reverse_clusters can use this instead of relying on the cluster
> value.
> >
> > I have not actually created a patch for this yet, as I'm not sure how you
> want to handle the bits in gproperties. I notice that it looks like only the
> low 16 bits are currently used; one option might be to split the field into
> two 16-bit fields, one for "glyph properties" (from GDEF), and one for
> "character" or "slot" properties, where the combining mark flag based on
> Unicode category could go.
> >
> > (I'd also suggest that "cluster" should be renamed "src_index", but
> that's a secondary issue.)
> >
> > JK
> >
> >
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20100604/270814cd/attachment.html>

More information about the HarfBuzz mailing list