[HarfBuzz] Mapping output glyphs back to input character

Sun Jul 22 20:37:23 PDT 2012

Hi Khaled,

On 07/21/2012 05:49 AM, Khaled Hosny wrote:
> How do I map output glyphs back to input characters? I assume I've to
> use clusters for that, but I can't make much sense of the cluster
> numbers I'm seeing and don't seem to find any explanation for them.

When you add text to a hb_buffer_t, you set a cluster number for each
character.  The functions hb_buffer_add_utf* implicitly use the index into the
input string for the cluster.  Ie. when using the UTF-8 version, UTF-8 indices
are used.

Note that hb-view/hb-shape by default use UTF-32 cluster numbers (ie.
character-count instead of byte-count).  You can change that using
--utf8-clusters.

The shaping process implicitly segments the input text + output glyphs in a
series of clusters.  So you can think of, for LTR text, first cluster followed
by second cluster, followed by third cluster, etc, where each cluster contains
a number of characters and a number of glyphs.

Now, the hb_glyph_info_t::cluster member after shaping simply points to the
minimum value of that member for all the characters that belong to the cluster.

For RTL it's similar, though in reverse direction.

Quick example.  If you add text for "differ", then initially characters get
cluster values 0,1,2,3,4,5 respectively.  After shaping, if the 'ff' ligature
was formed, you will get five glyphs, with cluster values 0,1,2,4,5.  This
means that the two characters that originally had cluster values 2 and 3 are
represented by the sole glyph having the cluster value 2.

Hope that helps.
behdad

> Regards,
>  Khaled