[HarfBuzz] Beginner question: What are cluster levels?

Graham Douglas graham.douglas at readytext.co.uk
Wed Jan 6 07:01:35 PST 2016


On 06/01/2016 14:37, Jonathan Kew wrote:
> On 6/1/16 14:17, Behdad Esfahbod wrote:
>> On 16-01-05 09:17 PM, Jamie Dale wrote:
>>> I actually just wrote something to give me very similar information
>>> since I
>>> realised that my basic "this is a ligature" flag wasn't enough data,
>>> so each
>>> of my glyphs now contains the number of characters that the glyph
>>> was composed
>>> from. This, along with the cluster index of the glyph from the
>>> source text,
>>> and the reading direction of the glyph, allow me to work out which
>>> characters
>>> formed the glyph.
>>
>> Correct.  That's pretty much the only way to do it.
>>
>
>
> Don't forget the added complication that there may be multiple glyphs
> with the same cluster value. E.g. given the text
>
>   <U+0915, U+094D, U+0915, U+093F, U+0915>
>
> you're very likely to get two glyphs with cluster index zero, as in
> something like
>
>   [imatra=0 | kka=0 | ka=4]
>
> but it's not at all clear from this how you'd determine which
> characters formed each glyph.
>
> JK
>
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Hi Jonathan

Yes, I've seen multiple glyphs with the same cluster value with mixed
English and fully-vowelled Arabic.

Is it technically possible to "enhance" HarfBuzz to provide an API to
give you the list of input
characters used to shape a particular glyph --- I really do not know
enough about the internals
of OpenType shaping to know whether that's an impossible (or hugely
complex) task.

Here's a test/debug sample a librqm run (of course uses HarfBuzz+FriBidi)  
(I modified libraqm to provide HarfBuzz data about glyph class)

Glyph information:
glyph [525]    glyph class: 3    x_offset: 440    y_offset: 360   
x_advance: 0    cluster value: [18]
glyph [2023]    glyph class: 2    x_offset: 0    y_offset: 0   
x_advance: 850    cluster value: [18]
glyph [529]    glyph class: 3    x_offset: 450    y_offset: -150   
x_advance: 0    cluster value: [16]
glyph [765]    glyph class: 1    x_offset: 0    y_offset: 0   
x_advance: 925    cluster value: [16]
glyph [525]    glyph class: 3    x_offset: 140    y_offset: -280   
x_advance: 0    cluster value: [14]
glyph [519]    glyph class: 1    x_offset: -100    y_offset: 0   
x_advance: 506    cluster value: [14]
glyph [3]    glyph class: 1    x_offset: 0    y_offset: 0    x_advance:
413    cluster value: [13]
glyph [64]    glyph class: 1    x_offset: 0    y_offset: 0    x_advance:
604    cluster value: [6]
glyph [73]    glyph class: 1    x_offset: 0    y_offset: 0    x_advance:
778    cluster value: [7]
glyph [66]    glyph class: 1    x_offset: 0    y_offset: 0    x_advance:
682    cluster value: [8]
glyph [71]    glyph class: 1    x_offset: 0    y_offset: 0    x_advance:
367    cluster value: [9]
glyph [68]    glyph class: 1    x_offset: 0    y_offset: 0    x_advance:
375    cluster value: [10]
glyph [78]    glyph class: 1    x_offset: 0    y_offset: 0    x_advance:
522    cluster value: [11]
glyph [67]    glyph class: 1    x_offset: 0    y_offset: 0    x_advance:
769    cluster value: [12]
glyph [3]    glyph class: 1    x_offset: 0    y_offset: 0    x_advance:
413    cluster value: [5]
glyph [792]    glyph class: 1    x_offset: 0    y_offset: 0   
x_advance: 1317    cluster value: [4]
glyph [527]    glyph class: 3    x_offset: 30    y_offset: 290   
x_advance: 0    cluster value: [2]
glyph [804]    glyph class: 1    x_offset: 0    y_offset: 0   
x_advance: 217    cluster value: [2]
glyph [525]    glyph class: 3    x_offset: -10    y_offset: 420   
x_advance: 0    cluster value: [0]
glyph [486]    glyph class: 1    x_offset: 0    y_offset: 0   
x_advance: 293    cluster value: [0]

UTF-32 clusters: 18 18 16 16 14 14 13 06 07 08 09 10 11 12 05 04 02 02 00 00
UTF-8 clusters:  27 27 23 23 19 19 18 11 12 13 14 15 16 17 10 08 04 04 00 00

Cheers
Graham



More information about the HarfBuzz mailing list