[HarfBuzz] Font-independent shaping

Fri Jan 23 14:36:15 PST 2015

On Wed, 21 Jan 2015 07:07:44 -0600
Ken Schutte <kenschutte at gmail.com> wrote:

> Is it possible to do "shaping" (not sure if that's the correct term
> here) without given a font?

> I realize different fonts will support different features, but I want
> to input a unicode string and get information like,
> 
> - this 'ARABIC LETTER BEH' should use 'ARABIC LETTER BEH INITIAL FORM'
> - this 'ARABIC SHADDA' will be combined with previous character
> - mandatory ligatures (lam+alif)
> etc
> (of course will not get glyph coordinates)

There's not much to stop you encapsulating this information in a font
stored in the directory of your choice with dummy glyphs (not hard to
create) and examining the glyphs you get out.

Your font should have a glyph for every character you are interested
in, and for their transforms.  There is a limit of 64K glyphs in
OpenType, so you might need to handle CJK characters separately.  There
may be some issues with control-characters that aren't rendered.

To consider your examples:

For ARABIC LETTER BEH, you would set up the 'init' feature for
the Arabic script to convert it to the appropriate glyph, which may be
the one you would map ARABIC LETTER BEH INITIAL FORM to.  As I think
has been mentioned, Arabic script forms for letters not used in Arabic
itself tend to lack encoded presentation forms.

For ARABIC SHADDA, what Harfbuzz will tell you is that there is some
form of interaction between it and a previous characters.  They will be
in a single 'cluster' after shaping, and this is the indication that
you would get.  You would also get the same information for a vowel
written before the consonant but stored after it, as occurs in most
Indic scripts. 

It would not tell you that lam+alif was a mandatory ligature.  Of
course, the dummy font could record that this yielded a ligature.

There isn't much information in Harfbuzz that isn't already in the
Unicode Character Database.  The most significant extra information is
that it will tell you that THAI CHARACTER SARA AM decomposes.  It may
even tell you, after a fashion, that part of this character ends up
between the preceding consonant and a tonemark.  However, you would
have to trace the relationship between the characters and glyphs.
Harfbuzz itself won't tell you the same for the equivalent Lanna script
sequence <U+1A61 TAI THAM VOWEL SIGN A, U+1A74 TAI THAM SIGN MAI KANG>,
for the very good reason that this is a stylistic decision.
The information about this has to be stored in the font.  Similarly, it
won't tell you which consonant character U+1A58 TAI THAM SIGN MAI KANG
LAI ends up above (one gets different answers in Burma and Thailand).

> Can I use harfbuzz for this or does it always require a font?

You will have to decide whether my answer is 'yes' or 'no'.  There are
tools for adding the GSUB rules to a font, and it isn't too difficult
to generate a formal font purely textually.  (There are compilers around
that will take transformation rules and add them to the glyph
definitions.)

I hope this helps.

Richard.