[HarfBuzz] Eager CS undergrad thirsty for knowledge about the inner bowels of typeface rendering. Specifically for supporting vertical ligature caret definitions.

Mon Jan 27 16:32:44 PST 2014

I have started to investigate into the matter and I find that there is very sparse information readily available. Microsoft had a very interesting document, which I have been trying to understand: http://www.microsoft.com/typography/OpenTypeDev/tibetan/intro.htm
In the section "Examples of Tibetan" (bottom of document), the first example shows how a sequence of eight code points are strung together to form a Tibetan "syllable". It's not really a syllable. It's called a tsheg-bar in Tibetan and a Tibetan word can consist of multiple of these. Anyways, all the code points from the second one up to and including the fourth one are as far as I can tell formed into a ligature. I have opened the MS Himalaya font in Fontforge and seen that this ligature is defined as "tibSa_Ga_Rata_Shapkyu" in a location outside of the Unicode address space.

Now, if a user is to work with Tibetan text like any other user of a roman script language, the user of Tibetan script would be very disappointed. The reason for this is that it is impossible to place the caret and select individual characters in this ligature. As of now, you can only select the entire stack as a whole selection. This is partly because the glyphs have been transmuted into a ligature, but perhaps also because there seems to be no definitions of vertical caret ligatures anywhere.

How would one go about defining such an important feature? Should this be implemented in the font? Should it be implemented in the software that handles the type face? Or perhaps both? While digging through the MS Himalaya font, I found that there is a value for a Ligature Caret Count. What is this value supposed to be used for? For the ligatures that are supposed to represent stacks of multiple glyphs, the Ligature Caret Count had values up to 4, which I hope can mean that the font itself contains the information I am looking for. Is my assumption correct?

Also. I am highly willing to learn more about the inner bowels of typeface rendering. I have taken a course on Computer design and understand how everything are bits and how Asssembly language and C handles this. I also understand the general idea about Unicode and how this is defined on a low level. I've also understood that fonts are basically Bézier curves which are rasterized to the screen buffer. There is still a lot of this process which I still find very murky, so if anybody knows any in depth reading material, I would be very happy to start reading those. I have read State of Text Rendering by Behdad Esfahbod, which was a great overview of the text rendering stack. But I would really like to get more in depth understanding of each layer in the stack.

I would really like to also learn more about Harfbuzz and how to work on it. I would really love to spend some time working on it, if I am at a level where my code submission would be acceptable to the standards of this project. Is there any documentation for Harfbuzz? I've taken a quick glance at the source code and run some scripts and make commands, but I honestly don't know what's going on. Why are the C files named .cc and some of the header files named .hh? I can recognize some font lingo and have a slight understanding of what might be going on, but it would be really helpful to have something like this for an Openfont file: http://imgur.com/a/JEObT#0. I must also say that I have no idea of what harfbuzz is supposed to do and how I test or use it once it's compiled. I've tried running some shell scripts and bin files that were compiled, but I really have no clue.

Is there somewhere I can learn how to get a handle on understanding the technicalities of the Harfbuzz project and learn what I need to start contributing?

Sincerely,
Robin Skahjem-Eriksen