[HarfBuzz] use of the 'cluster' field

Jonathan Kew jonathan at jfkew.plus.com
Thu Jun 3 06:42:15 PDT 2010

Hi Behdad,

As we discussed a bit in Reading, I'd like the handling of the 'cluster' field to be modified so that combining marks retain their original 'cluster' values, unless of course they get ligated with the base or otherwise processed. This will better preserve the association between glyphs and the original text. (We need this in order to identify glyphs such as CGJ in the final buffer.)

To do this, I think it's necessary to change hb_form_clusters into something like hb_mark_clusters, and have it set a flag in gproperties for the mark glyphs instead of actually changing the cluster field; then hb_buffer_reverse_clusters can use this instead of relying on the cluster value.

I have not actually created a patch for this yet, as I'm not sure how you want to handle the bits in gproperties. I notice that it looks like only the low 16 bits are currently used; one option might be to split the field into two 16-bit fields, one for "glyph properties" (from GDEF), and one for "character" or "slot" properties, where the combining mark flag based on Unicode category could go.

(I'd also suggest that "cluster" should be renamed "src_index", but that's a secondary issue.)


More information about the HarfBuzz mailing list