[HarfBuzz] Beginner question: What are cluster levels?
Graham Douglas
graham.douglas at readytext.co.uk
Wed Jan 6 07:01:35 PST 2016
On 06/01/2016 14:37, Jonathan Kew wrote:
> On 6/1/16 14:17, Behdad Esfahbod wrote:
>> On 16-01-05 09:17 PM, Jamie Dale wrote:
>>> I actually just wrote something to give me very similar information
>>> since I
>>> realised that my basic "this is a ligature" flag wasn't enough data,
>>> so each
>>> of my glyphs now contains the number of characters that the glyph
>>> was composed
>>> from. This, along with the cluster index of the glyph from the
>>> source text,
>>> and the reading direction of the glyph, allow me to work out which
>>> characters
>>> formed the glyph.
>>
>> Correct. That's pretty much the only way to do it.
>>
>
>
> Don't forget the added complication that there may be multiple glyphs
> with the same cluster value. E.g. given the text
>
> <U+0915, U+094D, U+0915, U+093F, U+0915>
>
> you're very likely to get two glyphs with cluster index zero, as in
> something like
>
> [imatra=0 | kka=0 | ka=4]
>
> but it's not at all clear from this how you'd determine which
> characters formed each glyph.
>
> JK
>
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
Hi Jonathan
Yes, I've seen multiple glyphs with the same cluster value with mixed
English and fully-vowelled Arabic.
Is it technically possible to "enhance" HarfBuzz to provide an API to
give you the list of input
characters used to shape a particular glyph --- I really do not know
enough about the internals
of OpenType shaping to know whether that's an impossible (or hugely
complex) task.
Here's a test/debug sample a librqm run (of course uses HarfBuzz+FriBidi)
(I modified libraqm to provide HarfBuzz data about glyph class)
Glyph information:
glyph [525] glyph class: 3 x_offset: 440 y_offset: 360
x_advance: 0 cluster value: [18]
glyph [2023] glyph class: 2 x_offset: 0 y_offset: 0
x_advance: 850 cluster value: [18]
glyph [529] glyph class: 3 x_offset: 450 y_offset: -150
x_advance: 0 cluster value: [16]
glyph [765] glyph class: 1 x_offset: 0 y_offset: 0
x_advance: 925 cluster value: [16]
glyph [525] glyph class: 3 x_offset: 140 y_offset: -280
x_advance: 0 cluster value: [14]
glyph [519] glyph class: 1 x_offset: -100 y_offset: 0
x_advance: 506 cluster value: [14]
glyph [3] glyph class: 1 x_offset: 0 y_offset: 0 x_advance:
413 cluster value: [13]
glyph [64] glyph class: 1 x_offset: 0 y_offset: 0 x_advance:
604 cluster value: [6]
glyph [73] glyph class: 1 x_offset: 0 y_offset: 0 x_advance:
778 cluster value: [7]
glyph [66] glyph class: 1 x_offset: 0 y_offset: 0 x_advance:
682 cluster value: [8]
glyph [71] glyph class: 1 x_offset: 0 y_offset: 0 x_advance:
367 cluster value: [9]
glyph [68] glyph class: 1 x_offset: 0 y_offset: 0 x_advance:
375 cluster value: [10]
glyph [78] glyph class: 1 x_offset: 0 y_offset: 0 x_advance:
522 cluster value: [11]
glyph [67] glyph class: 1 x_offset: 0 y_offset: 0 x_advance:
769 cluster value: [12]
glyph [3] glyph class: 1 x_offset: 0 y_offset: 0 x_advance:
413 cluster value: [5]
glyph [792] glyph class: 1 x_offset: 0 y_offset: 0
x_advance: 1317 cluster value: [4]
glyph [527] glyph class: 3 x_offset: 30 y_offset: 290
x_advance: 0 cluster value: [2]
glyph [804] glyph class: 1 x_offset: 0 y_offset: 0
x_advance: 217 cluster value: [2]
glyph [525] glyph class: 3 x_offset: -10 y_offset: 420
x_advance: 0 cluster value: [0]
glyph [486] glyph class: 1 x_offset: 0 y_offset: 0
x_advance: 293 cluster value: [0]
UTF-32 clusters: 18 18 16 16 14 14 13 06 07 08 09 10 11 12 05 04 02 02 00 00
UTF-8 clusters: 27 27 23 23 19 19 18 11 12 13 14 15 16 17 10 08 04 04 00 00
Cheers
Graham
More information about the HarfBuzz
mailing list