[HarfBuzz] Cluster question (Was Cluster soap box time)

Tue Nov 20 00:44:51 UTC 2018

On 11/19/2018 01:40 PM, Nathan Willis wrote:
> I think the clean-up that I just did to the "clusters" usermanual 
> chapter improves things, but it did also reveal where there's some 
> additional room for growth.
>
> For a couple of those, I would really appreciate some suggestions. In 
> particular, I'd like to hear any recommendations from the list for two 
> things:
>
> - a real-world example of where cluster-level 2 is what you'd actually 
> want to use
> - a real-world example of where level 0 does the wrong thing but level 
> 1 gets it right (I thought of multiple-mark reordering here already; 
> not sure how many others there are...)
>
> I just think that some 'from the wild' examples would make the subject 
> easier to digest.
>
> Thanks,
> Nat

Your post, and the updated documentation, is perfect timing for a 
question I have been working on. We have a real world application that 
used a very old version of harfbuzz (0.9x). We are updating to the 
current harfbuzz release, and one of the issues we are working on is how 
to use the new harfbuzz to locate grapheme breaks. We were under the 
impression this information would be available in the cluster numbers 
after shaping, but your updates to the documentation are making us 
wonder if we are not using the API correctly. We have tried cluster 
levels 0 and 1, and neither one worked as we expected. In every case, 
combining accents are marked as being in a separate cluster to the base 
codepoint. For example, U+0061 Latin Small Letter A followed by U+0308 
Combining Diaeresis are being placed in adjacent clusters rather than 
the same cluster. Did I misunderstand the purpose of clustering 
codepoints? Is there a way to use the current version of harfbuzz to 
break graphemes?

Any help or insight anyone can provide would be greatly appreciated.