[HarfBuzz] Question on converting UTF-8 codepoints to complex glyphs

Paul Daughetee Daughetee at finaldraft.com
Thu Apr 11 18:03:10 UTC 2019


I agree, the font question does seem to be irrelevant. I was just responding to Cody’s comments. However, I’m still lost on getting the correct ligature back from the HarfBuzz shaping engine when I give it a simple Tamil word comprised of the Tamil characters ii, tta and u. According to Google, this word “ஈடு” translates to the verb “compensate.” “ஈடு” is the two glyphs ஈ and டு , the latter of which is the ligature formed by the codepoints corresponding to the glyphs ட and உ.

So is it a question of enabling the correct font features? Is there something beyond the basic examples that’s required to get the shaper to return the ligature for the tta and u consonant-vowel combination? Do I have a basic misunderstanding of what HarfBuzz does?

Here’s a bit of the code I’m using. It’s derived from the example found in git here: tangrams/harfbuzz-example (https://github.com/tangrams/harfbuzz-example/tree/a267a0032aa429b2f86959a9f083c607c506bed7).

In that last loop in FDHBShaper, I understand that the glyph id’s are NOT Unicode code points but are the id’s assigned in the font. What I’m getting back (the output) are the same id’s that correspond to my input. Should I be getting two glyph id’s back (ஈ and டு ) for the three I input ( ஈ, ட and உ ) to the shaper?

#pragma once

#include "hb.h"
#include "hb-ft.h"
#include <vector>

using namespace std;

typedef struct {
       unsigned char* buffer;
       unsigned int width;
       unsigned int height;
       float bearing_x;
       float bearing_y;
} Glyph;

typedef struct {
       std::string data;
       std::string language;
       hb_script_t script;
       hb_direction_t direction;
       const char* c_data() { return data.c_str(); };
} HBText;

namespace HBFeature {
       const hb_tag_t KernTag = HB_TAG('k', 'e', 'r', 'n'); // kerning operations
       const hb_tag_t LigaTag = HB_TAG('l', 'i', 'g', 'a'); // standard ligature substitution
       const hb_tag_t CligTag = HB_TAG('c', 'l', 'i', 'g'); // contextual ligature substitution
       const hb_tag_t PstsTag = HB_TAG('p', 's', 't', 's'); // ? ligature substitution

       static hb_feature_t LigatureOff = { LigaTag, 0, 0, std::numeric_limits<unsigned int>::max() };
       static hb_feature_t LigatureOn = { LigaTag, 1, 0, std::numeric_limits<unsigned int>::max() };
       static hb_feature_t KerningOff = { KernTag, 0, 0, std::numeric_limits<unsigned int>::max() };
       static hb_feature_t KerningOn = { KernTag, 1, 0, std::numeric_limits<unsigned int>::max() };
       static hb_feature_t CligOff = { CligTag, 0, 0, std::numeric_limits<unsigned int>::max() };
       static hb_feature_t CligOn = { CligTag, 1, 0, std::numeric_limits<unsigned int>::max() };
       static hb_feature_t PstsOff = { PstsTag, 0, 0, std::numeric_limits<unsigned int>::max() };
       static hb_feature_t PstsOn = { PstsTag, 1, 0, std::numeric_limits<unsigned int>::max() };
}

class FDHBShaper
{
public:
       FDHBShaper(const string& fontFile);
       virtual ~FDHBShaper();

       void init();
       void initText(HBText& text);
       void addFeature(hb_feature_t feature);

private:

       FT_Library lib;
       FT_Face* face;

       hb_font_t* font;
       hb_buffer_t* buffer;
       vector<hb_feature_t> features;
};

FDHBShaper::FDHBShaper(const string& fontFile)
{
       FT_Error error = FT_Init_FreeType(&lib);
       assert(!error);

       float size = 50;
       face = new FT_Face;

       error = FT_New_Face(lib, fontFile.c_str(), 0, face);
       assert(!error);
}

FDHBShaper::~FDHBShaper()
{
       hb_buffer_destroy(buffer);
       hb_font_destroy(font);

       FT_Done_Face(*face);
       delete face;
}

void FDHBShaper::addFeature(hb_feature_t feature)
{
       features.push_back(feature);
}

void FDHBShaper::init()
{
       font = hb_ft_font_create(*face, NULL);
       buffer = hb_buffer_create();

       hb_buffer_allocation_successful(buffer);
}

void FDHBShaper::initText(HBText& text)
{
       hb_buffer_reset(buffer);

       hb_buffer_set_direction(buffer, text.direction);
       hb_buffer_set_script(buffer, text.script);
       hb_buffer_set_language(buffer, hb_language_from_string(text.language.c_str(), text.language.size()));
       size_t length = text.data.size();

       hb_buffer_add_utf8(buffer, text.c_data(), length, 0, length);

       hb_shape(font, buffer, features.empty() ? NULL : &features[0], features.size());

       unsigned int glyphCount;
       hb_glyph_info_t *glyphInfo = hb_buffer_get_glyph_infos(buffer, &glyphCount);
       hb_glyph_position_t *glyphPos = hb_buffer_get_glyph_positions(buffer, &glyphCount);

       for (unsigned int i = 0; i < glyphCount; ++i)
       {
              hb_codepoint_t glyphid = glyphInfo[i].codepoint;
       }
}

From: Behdad Esfahbod <behdad at behdad.org>
Sent: April 11, 2019 8:58 AM
To: Bobby de Vos <bobby_devos at sil.org>
Cc: Paul Daughetee <Daughetee at finaldraft.com>; Cody Planteen <planteen at gmail.com>; harfbuzz at lists.freedesktop.org
Subject: Re: [HarfBuzz] Question on converting UTF-8 codepoints to complex glyphs

What you say seems irrelevant to me. Jonathan is correct.

On Thu, Apr 11, 2019, 11:34 AM Bobby de Vos <bobby_devos at sil.org<mailto:bobby_devos at sil.org>> wrote:

Paul,

You don't need to convert the Google Tamil font to OpenType, Google has already done that at

https://github.com/googlei18n/noto-fonts/tree/master/phaseIII_only/unhinted/otf/NotoSansTamil

However, I don't think those fonts will solve your issue. The list of shapers that you mention are different technologies to specify the complex shaping (such ligatures, positioning, sub forms, half foms, etc). Indeed, OpenType is one such technology. SIL Graphite and Apple Advanced Typography (AAT) are other technologies to do this.

TrueType fonts can contain OpenType shaping instructions. You do not have to have an OpenType font format to use OpenType shaping.

TrueType fonts have quadratic Bézier curves for their glyphs. Fonts in the OpenType font format can use the same quadratic Bézier curves, or cubic Bézier curves. The OTF files I mentioned above have cubic Bézier curves.

https://en.wikipedia.org/wiki/B%C3%A9zier_curve#Fonts

If I have mis-understood your situation, and/or made any errors if what I wrote, I apologize.

Bobby
On 2019-04-10 3:25 p.m., Paul Daughetee wrote:
Thanks for the quick response. I’m a licensed user of FontCreator Professional Edition from High-Logic and have the most recent update to version 11.5 installed.  The correct ligature is displayed when I type the tta and u Tamil characters into the test string edit box in the OpenType Designer dialog. In the box just below the test string the two characters are displayed unless I check either the _shaper or psts feature check box. If one of those is checked, then the correct ligature is displayed. So I guess Google did get the Tamil font right but I cannot seem to get HarfBuzz to return a single glyph id when presented with a buffer containing the tta and u Tamil characters. I’ve tried adding various features when calling hb_shape but that doesn’t seem to change anything.

I noticed that when I list shapers using a call to hb_shape_list_shapers, the only shaper listed is “ot”. So I guess my next try will be to convert the Google Tamil true type font to an open type font and see if that makes any difference. If it does, I guess I’ll be having a “duh” moment.

From: Cody Planteen <planteen at gmail.com><mailto:planteen at gmail.com>
Sent: April 10, 2019 12:38 PM
To: Paul Daughetee <Daughetee at finaldraft.com><mailto:Daughetee at finaldraft.com>
Cc: harfbuzz at lists.freedesktop.org<mailto:harfbuzz at lists.freedesktop.org>
Subject: Re: [HarfBuzz] Question on converting UTF-8 codepoints to complex glyphs

It's possible your font isn't doing what you think it should be. You can test this theory with the tool High-Logic FontCreator for Windows. I believe there is a free evaluation. You can open up your font, then go to Font -> OpenType Designer. In this dialog, you can enter your test string and see what glyphs come out.

https://www.high-logic.com/font-editor/fontcreator


On Wed, Apr 10, 2019 at 1:19 PM Paul Daughetee <Daughetee at finaldraft.com<mailto:Daughetee at finaldraft.com>> wrote:
Let me give you a little more info. I just recently built and installed vcpkg and used it to install HarfBuzz on Windows 10. It installed version 2.3.1-3 of the static libraries for Window x86. I linked my app to the HarfBuzz library and its dependencies. I added code to my app to capture single words that I could send to be processed by HarfBuzz as they were typed by the user. I installed Google’s NotoSansTamil true type font after verifying that it properly defined substitutions for the ligature that is formed by the Tamil consonant “tta” when paired with a vowel such as “u” or “I”. After processing a UTF-8 string containing the consonant and the vowel “tta” and “u” [0xE0, 0xAE, 0x9F, 0xE0, 0xAE, 0x89], the hb_glyph_info_t object I get back has tow glyph indices, the same indices as the “tta” and “u” (17, 10) rather than the index for the “ttauvowelsign” (116) ligature I expected. My code is virtually identical to the examples found in the HarfBuzz wiki and to several examples found in git. Any help here would be greatly appreciated.

From: Behdad Esfahbod <behdad at behdad.org<mailto:behdad at behdad.org>>
Sent: April 8, 2019 1:47 PM
To: Paul Daughetee <Daughetee at finaldraft.com<mailto:Daughetee at finaldraft.com>>
Cc: harfbuzz at lists.freedesktop.org<mailto:harfbuzz at lists.freedesktop.org>
Subject: Re: [HarfBuzz] Question on converting UTF-8 codepoints to complex glyphs

On Mon, Apr 8, 2019 at 4:12 PM Paul Daughetee <Daughetee at finaldraft.com<mailto:Daughetee at finaldraft.com>> wrote:
I’m new to HarfBuzz and attempting to use it for converting a UTF-8 string that contains one or more sets of codepoints that should combine to form single complex glyphs to the correct string of glyphs. I’ve followed numerous examples and they all lead me to the point where I use hb_buffer_get_glyph_infos to get what I thought would be a hb_glyph_info object that contains the codepoints for the glyphs I seek. So my first question is as follows. Is that what I should be getting? I ask because I’m not getting what I would expect to get.

Yes.


I can’t even successfully get a complex glyph to represent the combination of the letter A and the grave accent. So if I’m just confused as to how or what HarfBuzz does, please help me find a better path. Thanks!

What do you get?  A + grave-accent only forms one glyph if the font was designed so.  It may very well be represented by two glyphs.

_______________________________________________
HarfBuzz mailing list
HarfBuzz at lists.freedesktop.org<mailto:HarfBuzz at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


--
behdad
http://behdad.org/
_______________________________________________
HarfBuzz mailing list
HarfBuzz at lists.freedesktop.org<mailto:HarfBuzz at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/harfbuzz


_______________________________________________

HarfBuzz mailing list

HarfBuzz at lists.freedesktop.org<mailto:HarfBuzz at lists.freedesktop.org>

https://lists.freedesktop.org/mailman/listinfo/harfbuzz
--
Bobby de Vos
bobby_devos at sil.org<mailto:bobby_devos at sil.org>
_______________________________________________
HarfBuzz mailing list
HarfBuzz at lists.freedesktop.org<mailto:HarfBuzz at lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/harfbuzz
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/harfbuzz/attachments/20190411/6c23959b/attachment-0001.html>


More information about the HarfBuzz mailing list