[HarfBuzz] Question on converting UTF-8 codepoints to complex glyphs

Wed Apr 10 21:28:32 UTC 2019

On 10/04/2019 20:19, Paul Daughetee wrote:
> Let me give you a little more info. I just recently built and installed 
> vcpkg and used it to install HarfBuzz on Windows 10. It installed 
> version 2.3.1-3 of the static libraries for Window x86. I linked my app 
> to the HarfBuzz library and its dependencies. I added code to my app to 
> capture single words that I could send to be processed by HarfBuzz as 
> they were typed by the user. I installed Google’s NotoSansTamil true 
> type font after verifying that it properly defined substitutions for the 
> ligature that is formed by the Tamil consonant “tta” when paired with a 
> vowel such as “u” or “I”. After processing a UTF-8 string containing the 
> consonant and the vowel “tta” and “u” [0xE0, 0xAE, 0x9F, 0xE0, 0xAE, 
> 0x89], the hb_glyph_info_t object I get back has tow glyph indices, the 
> same indices as the “tta” and “u” (17, 10) rather than the index for the 
> “ttauvowelsign” (116) ligature I expected. My code is virtually 
> identical to the examples found in the HarfBuzz wiki and to several 
> examples found in git. Any help here would be greatly appreciated.

It sounds like you're not very familiar with Tamil script?

The UTF-8 sequence [0xE0, 0xAE, 0x9F, 0xE0, 0xAE, 0x89] corresponds to 
the two characters <U+0B9F TAMIL LETTER TTA, U+0B89 TAMIL LETTER U>. 
These are not expected to combine: U+0B89 is a "full" or standalone 
vowel letter.

The ligated syllable 'ttauvowelsign' would be formed by <U+0B9F TAMIL 
LETTER TTA, U+0BC1 TAMIL VOWEL SIGN U>.

JK

> 
> *From:* Behdad Esfahbod <behdad at behdad.org>
> *Sent:* April 8, 2019 1:47 PM
> *To:* Paul Daughetee <Daughetee at finaldraft.com>
> *Cc:* harfbuzz at lists.freedesktop.org
> *Subject:* Re: [HarfBuzz] Question on converting UTF-8 codepoints to 
> complex glyphs
> 
> On Mon, Apr 8, 2019 at 4:12 PM Paul Daughetee <Daughetee at finaldraft.com 
> <mailto:Daughetee at finaldraft.com>> wrote:
> 
>     I’m new to HarfBuzz and attempting to use it for converting a UTF-8
>     string that contains one or more sets of codepoints that should
>     combine to form single complex glyphs to the correct string of
>     glyphs. I’ve followed numerous examples and they all lead me to the
>     point where I use hb_buffer_get_glyph_infos to get what I thought
>     would be a hb_glyph_info object that contains the codepoints for the
>     glyphs I seek. So my first question is as follows. Is that what I
>     should be getting? I ask because I’m not getting what I would expect
>     to get.
> 
> Yes.
> 
>     I can’t even successfully get a complex glyph to represent the
>     combination of the letter A and the grave accent. So if I’m just
>     confused as to how or what HarfBuzz does, please help me find a
>     better path. Thanks!
> 
> What do you get?  A + grave-accent only forms one glyph if the font was 
> designed so.  It may very well be represented by two glyphs.
> 
>     _______________________________________________
>     HarfBuzz mailing list
>     HarfBuzz at lists.freedesktop.org <mailto:HarfBuzz at lists.freedesktop.org>
>     https://lists.freedesktop.org/mailman/listinfo/harfbuzz
> 
> 
> -- 
> 
> behdad
> http://behdad.org/
> 
> 
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/harfbuzz
>