[HarfBuzz] HarfBuzz API design

Tue Aug 18 16:23:50 PDT 2009

[Warning: long email ahead]

Hi all,

With the rewritten HarfBuzz OpenType Layout engine merged in pango master now, 
I've been working on the public API for a few weeks.  I have it mostly 
designed, though there are a few open questions still and I appreciate any 
feedback.  The current code can be browsed here:

   http://git.gnome.org/cgit/pango/tree/pango/opentype

I will add a separate configure.ac to that directory in the coming days such 
that it can be built as a standalone library.  In a couple of weeks I may even 
move it back out to its own git repo and use git magic to pull it in pango 
until we start using it as a shared library (expect end of year).

When designing HarfBuzz API my reference has been cairo.  That is, usability 
has been the top priority.  Other than that, hiding technical details while 
still being powerful enough to implement advanced features internally are 
other goals of the API.

In this mail I'll only discuss the backend-agnostic API, which is what I 
expect most users will use.  This is what will be available by including 
"hb.h".  For example, OpenType-specific APIs will be included in "hb-ot.h" 
only.  That includes querying list of supported OpenType scripts, language 
systems, features, etc.

Finally, the other strict goal of the API is to be fully thread-safe.  That 
means, I had to bit the bullet and add refcounting API already.  Object 
lifecycle API is like cairo's, that is, each object has: _create(), 
_reference(), _destory(), and _get_reference_count().  At some point we may 
want to add _[gs]et_user_data() also which is useful for language bindings.

Error handling is also designed somewhat like cairo's, that is, objects keep 
track of failure internally (including malloc failures), but unlike cairo, 
there's no direct way to query objects for errors.  HarfBuzz simply does its 
best to get you the result you wanted.  In case of errors, the output may be 
wrong, but there's nothing you can do to improve it.  There's not much point 
in reporting that state anyway.  So, no error handling in the API.

Before jumping into the API, lemme introdce a memory management construct I 
added first:

Blobs
=====

hb_blob_t is a refcounted container for raw data and is introduced to make 
memory management between HarfBuzz and the user easy and flexible.  Blobs can 
be created by:

typedef enum {
   HB_MEMORY_MODE_DUPLICATE,
   HB_MEMORY_MODE_READONLY,
   HB_MEMORY_MODE_WRITEABLE,
   HB_MEMORY_MODE_READONLY_NEVER_DUPLICATE,
   HB_MEMORY_MODE_READONLY_MAY_MAKE_WRITEABLE,
} hb_memory_mode_t;

typedef struct _hb_blob_t hb_blob_t;

hb_blob_t *
hb_blob_create (const char        *data,
                 unsigned int       length,
                 hb_memory_mode_t   mode,
                 hb_destroy_func_t  destroy,
                 void              *user_data);

The various mode parameters mean:

   DUPLICATE: copy data right away and own it.

   READONLY: the data passed in can be kept for later use, but should not be 
modified.  If modification is needed, the blob will duplicate the data lazily.

   WRITEABLE: data is writeable, use it freely.

   READONLY_NEVER_DUPLICATE: data is readonly and should never be duplicated. 
  This disables operations needing write access to data.

   READONLY_MAY_MAKE_WRITEABLE: data is readonly but may be made writeable 
using mprotect() or equivalent win32 calls.  It's up to the user to make sure 
calling mprotect() or system-specific equivalents on the data is safe.  In 
practice, that's never an issue on Linux and (according to Tor) on win32.

One can also create a sub-blob of a blob:

hb_blob_t *
hb_blob_create_sub_blob (hb_blob_t    *parent,
                          unsigned int  offset,
                          unsigned int  length);

Blob's data can be used by locking it:

const char *
hb_blob_lock (hb_blob_t *blob);

One can query whether data is writeable:

hb_bool_t
hb_blob_is_writeable (hb_blob_t *blob);

Can request it to be writeable inplace:

hb_bool_t
hb_blob_try_writeable_inplace (hb_blob_t *blob);

Or can request making data writeable, making a copy if need be:

hb_bool_t
hb_blob_try_writeable (hb_blob_t *blob);

For the latter the blob must not be locked.  The lock is recursive.  The blob 
internal stuff is protected using a mutex and hence the structure is threadsafe.

The main use of the blob is to provide font data or table data to HarfBuzz. 
More about that later.

Text API
========

Perhaps the biggest difference between the API of the old Qt-based HarfBuzz 
shaper API and the new one is that the new API reuses hb-buffer for its 
shaping input+output.  So, this is how you will use harfbuzz in three lines:

   - create buffer
   - add text to buffer    ---> buffer contains Unicode text now
   - call hb_shape() on it
   - use output glyphs     ---> buffer contains positioned glyphs now

Within that picture, there are three main objects in HarfBuzz:

   - hb_buffer_t: holds text/glyphs, and is not threadsafe

   - hb_face_t: represents a single SFNT face, fully threadsafe beyond 
construction.   Maps to cairo_font_face_t.

   - hb_font_t: represents a face at a certain size with certain hinting 
options, fully threadsafe beyond construction.  Maps to cairo_scaled_font_t.

Buffer
======

The buffer's output is two arrays: glyph infos and glyph positions. Eventually 
these two items will look like:

typedef struct _hb_glyph_info_t {
   hb_codepoint_t codepoint;
   hb_mask_t      mask;
   uint32_t       cluster;
   uint16_t       component;
   uint16_t       lig_id;
   uint32_t       internal;
} hb_glyph_info_t;

typedef struct _hb_glyph_position_t {
   hb_position_t  x_pos;
   hb_position_t  y_pos;
   hb_position_t  x_advance;
   hb_position_t  y_advance;
   uint32_t       internal;
} hb_glyph_position_t;

One nice thing about using hb-buffer for input is that we can now easily add 
UTF-8, UTF-16, and UTF-32 APIs to HarfBuzz by simply implementing:

void
hb_buffer_add_utf8 (hb_buffer_t  *buffer,
                     const char   *text,
                     unsigned int  text_length,
                     unsigned int  item_offset,
                     unsigned int  item_length);

void
hb_buffer_add_utf16 (hb_buffer_t    *buffer,
                      const uint16_t *text,
                      unsigned int    text_length,
                      unsigned int    item_offset,
                      unsigned int    item_length);

void
hb_buffer_add_utf32 (hb_buffer_t    *buffer,
                      const uint32_t *text,
                      unsigned int    text_length,
                      unsigned int    item_offset,
                      unsigned int    item_length);

These add individual Unicode characters to the buffer and set the cluster 
values respectively.

Face
====

HarfBuzz is build around the SFNT font format.  A Face simply represents a 
SFNT face, although this is all transparent to the user: you can pass junk to 
HarfBuzz as font data and it will simply ignore it.  There are two main face 
constructors:

hb_face_t *
hb_face_create_for_data (hb_blob_t    *blob,
                          unsigned int  index);

typedef hb_blob_t * (*hb_get_table_func_t)  (hb_tag_t tag, void *user_data);

/* calls destory() when not needing user_data anymore */
hb_face_t *
hb_face_create_for_tables (hb_get_table_func_t  get_table,
                            hb_destroy_func_t    destroy,
                            void                *user_data);

The for_tables() version uses a callback to load SFNT tables, whereas the 
for_data() version takes a blob which contains the font file data, plus the 
face index for TTC collections.

The face is only responsible for the "complex" part of the shaping right now, 
that is, OpenType Layout features (GSUB/GPOS...).  In the future we may also 
access cmap directly.  Not implemented right now, but old-style 'kern' table 
will also be implemented in the same layer.

The reason for introducing the blob machinery is that the new OpenType Layout 
engine and any other table work we'll add use the font data directly, instead 
of parsing it into separate data structures.  For that reason, we need to 
"sanitize" the font data first.  When sanitizing, instead of pass/fail, upon 
finding irregularities (say, an offset that points to out of the table), we 
may modify the font data to make it clean-enough to pass to the layout code. 
In those cases, we first try to make the blob writeable in place, and if that 
fails, to make a writeable dup of it.  That is, copy-on-write easy or the hard 
way.  For sane fonts, this means zero per-process memory is consumed.  In the 
future, we'll cache sanitize() results in fontconfig such that not every 
process has to sanitize() clean fonts.

Font
====

Normally I would have made the font constructor take a hb_face_t (like cairo's 
does indeed).  A font is a face at a certain size and with certain hinting / 
other options afterall.  However, FreeType's lack of refcounting makes this 
really hard.  The reason being:  Pango caches hb_face_t on the FT_Face 
instance's generic slot.  Whereas a hb_font_t should be attached to a 
PangoFont or PangoFcFont.

As everyone knows, FT_Face is not threadsafe, is not refcounted, and is not 
just a face, but also includes sizing information for one font at a time.  For 
this reasons, whenever a font wants to access a FT_Face, it needs to "lock" 
one.  When you lock it though, you don't necessarily get the same object that 
you got the last time.  It may be a totally different object, created for the 
same font data, depending on who manages your FT_Face pool (cairo in our 
case).  Anyway, for this reason, having hb_font_t have a ref to hb_face_t 
makes life hard: one either would have to create/destroy hb_font_t between 
FT_Face lock/unlock, or risk having a hb_face_t pointing to memory owned by a 
FT_Face that may have been freed since.

For the reasons above I opted for not refing a face from hb_font_t and instead 
passing both a face and a font around in the API.  Maybe I should use a 
different name (hb_font_scale_t?)  I'd rather keep names short, instead of 
cairo style hb_font_face_t and hb_scaled_font_t.

Anyway, a font is created easily:

hb_font_t *
hb_font_create (void);

One then needs to set various parameters on it, and after the last change, it 
can be used from multiple threads safely.

Shaping
=======

The main hb_shape() API I have written down right now (just a stub) is:

typedef struct _hb_feature_t {
   const char   *name;
   const char   *value;
   unsigned int  start;
   unsigned int  end;
} hb_feature_t;

void
hb_shape (hb_face_t    *face,
           hb_font_t    *font,
           hb_buffer_t  *buffer,
           hb_feature_t *features,
           unsigned int  num_features);

where features are normally empty, but can be used to pass things like:

   "kern"=>"0"         -------> no kerning
   "ot:aalt"=>"2"      -------> use 2nd OpenType glyph alternative
   "ot:mkmk"=>"0"      -------> never apply 'mkmk' OpenType feature

Perhaps:

   "ot:script"=>"math" ------> Force an OpenType script tag
   "ot:langsys"=>"FAR " -----> Force an OpenType language system

Maybe:

   "ot"=>"0"           ------> Disable OpenType engine (prefer AAT, SIL, etc)

Or perhaps even features marking visual edge of the text, etc.

Discussion
==========

Script and language
===================

Normally the shape() call needs a few more pieces of information.  Namely: 
text direction, script, and language.  Note that none of those belong on the 
face or font objects.  For text direction, I'm convinced that it should be set 
on the buffer, and already have that in place.

For script and language, it's a bit more delicate.  I'm also convinced that 
they belong to the buffer.  With script it's fine, but with language it 
introduces a small implementation hassle: that I would have to deal with 
copying/interning language tags, something I was trying to avoid.  The other 
options are:

   - Extra parameters to hb_shape().  I rather not do this.  Keeping details 
like this out of the main API and addings setters where appropriate makes the 
API cleaner and more extensible.

   - Use the feature dict for them too.  I'm strictly against this one.  The 
feature dict is already too highlevel for my taste.

So, comments here is appreciated.

Unicode callbacks
=================

HarfBuzz itself does not include any Unicode character database tables, but 
needs access to a few properties, some of them for fallback shaping only. 
Currently I have identified the following properties as being useful at some 
point:

typedef hb_codepoint_t
(*hb_unicode_get_mirroring_func_t) (hb_codepoint_t unicode);

Needed to implement character-level mirroring.

typedef hb_category_t
(*hb_unicode_get_general_category_func_t) (hb_codepoint_t unicode);

Used for synthesizing GDEF glyph classes when the face doesn't have them.

typedef hb_script_t
(*hb_unicode_get_script_func_t) (hb_codepoint_t unicode);

Not needed unless we also implement script itemization (which we can do 
transparently, say, if user passed SCRIPT_COMMON to the shape() function).

typedef unsigned int
(*hb_unicode_get_combining_class_func_t) (hb_codepoint_t unicode);

Useful for all kinds of mark positioning when GPOS is not available.

typedef unsigned int
(*hb_unicode_get_eastasian_width_func_t) (hb_codepoint_t unicode);

Not sure it will be useful in HarfBuzz layer.  I recently needed to use it 
correctly set text in vertical direction in Pango.

I've added an object called hb_unicode_funcs_t that holds all these callbacks. 
  It can be ref'ed, as well as copied.  There's also a 
hb_unicode_funcs_make_immutable() call, useful for libraries who want to give 
out references to a hb_unicode_funcs_t object they own but want to make sure 
the user doesn't modify the object by mistake.

The hb-glib.h layer then implements:

hb_unicode_funcs_t *
hb_glib_get_unicode_funcs (void);

The question then is where to pass the unicode funcs to the shape() machinery. 
  My current design has it on the face:

void
hb_face_set_unicode_funcs (hb_face_t *face,
                            hb_unicode_funcs_t *unicode_funcs);

However, that is quite arbitrary.  There is nothing in the face alone that 
requires Unicode functionality.  Moreover, I want to keep the face very 
objective.  Say, you should be able to get the hb_face_t from whoever provides 
you with one (pango, ...), and use it without worrying about what settings it 
has.  The Unicode funcs, while welldefined, can still come from a variety of 
sources: glib, Qt, Python's, your own experiments, ...

I started thinking about moving that to the buffer instead.  That's the only 
other place that Unicode comes in (add_utf8/...), and the buffer is the only 
object that is not shared by HarfBuzz, so user has full control over it.

One may ask why have the callbacks settable to begin with.  We can hardcode 
them at build time: if glib is available, use it, otherwise use our own copy 
or something.  While I may make it to fallback to whatever has been available 
at compile time, I like being able to let user set the callbacks.  At least 
until I write one UCD library to rule them all... /me grins

So that's another question I need feedback about.

Font callbacks
==============

These are the font callbacks (font class, font funcs, ...) that I've 
prototyped.  Note that both the font, face, and a user_data parameter are 
passed to all of them.  Some of these callbacks technically just need a face, 
not font, but since many systems implement these functions on actual fonts not 
faces, we implement it this way.  Right now one can set the 
hb_font_callbacks_t object on the hb-font and set user_data there 
(hb_font_set_funcs()).

typedef hb_codepoint_t
(*hb_font_get_glyph_func_t) (hb_font_t *font, hb_face_t
                              *face, const void *user_data,
                              hb_codepoint_t unicode,
                              hb_codepoint_t variant_selector);

This is the cmap callback.  Note the variant_selector: it supports new cmap14 
tables.  For older clients, they can ignore that argument and do the mapping. 
  We probably will implement support for Unicode cmaps internally, but chain 
to this function for missing-glyphs or if no suitable cmap was found.  That 
has three advantages:

   - Pango etc can pass whatever code they want for missing glyphs, to use 
later to draw hexbox,

   - Pango, through fontconfig, knows how to handle non-Unicode cmaps, so that 
will continue to work,

   - For non-SFNT fonts, HarfBuzz should happily sit back and make things work 
still, this is how that will work.

typedef hb_bool_t
(*hb_font_get_contour_point_func_t) (hb_font_t *font, hb_face_t *face,
                                      const void *user_data,
                                      hb_codepoint_t glyph,
                                      hb_position_t *x, hb_position_t *y);

Needed for complex GPOS positioning.  Pango never did this before.  Pretty 
straightforward, just need to make it clear the space that the positions are 
returned in.  I'll discuss that in the next section.

typedef void
(*hb_font_get_glyph_metrics_func_t) (hb_font_t *font, hb_face_t *face, const
                                      void *user_data, hb_codepoint_t glyph,
                                      hb_glyph_metrics_t *metrics);

This one is a bit more tricky.  Technically we just need the advance width. 
The rest of the metrics are only used for fallback mark positioning.  So maybe 
I should split this in a get_glyph_advance and a full get_glyph_metrics one. 
Current HarfBuzz has a single call to get advance width of multiple glyphs. 
If that kind of optimization deems necessary in the future, we can add a 
callback to take an entire buffer and set the advances.

There are more issues here though:

   1) The metrics struct most probably should be public.  However, in the 
future I like to use bearing-deltas to improve positioning.  A transparent 
struct doesn't help in those situations.  Not sure what the alternatives are.

   2) It's not exactly clear how to deal with vertical fonts.  One way would 
be to assume that if buffer direction is vertical, then the font already knows 
that and returns the vertical metrics.  That's not a completely off 
assumption, though that may not be how win32 fonts work?

typedef hb_position_t
(*hb_font_get_kerning_func_t) (hb_font_t *font, hb_face_t *face,
                                const void *user_data,
                                hb_codepoint_t first_glyph,
                                hb_codepoint_t second_glyph);

Again, most probably we will read 'kern' table internally anyway, but this can 
be used for fallback with non-SFNT fonts.  You can even pass, say, SVG fonts 
through HarfBuzz such that the higher level just deals with one API.

Another call that may be useful is a get_font_metrics one.  Again, only useful 
in fallback positioning.  In that case, ascent/descent as well as slope come 
handy.

Font scale, etc
===============

Currently, based on the old code, the font object has the following setters:

void
hb_font_set_scale (hb_font_t *font,
                    hb_16dot16_t x_scale,
                    hb_16dot16_t y_scale);

void
hb_font_set_ppem (hb_font_t *font,
                   unsigned int x_ppem,
                   unsigned int y_ppem);

The ppem API is well-defined: that's the ppem to use for hinting and 
device-dependent positioning.  Old HarfBuzz also had a "device-independent" 
setting, which essentially turned hinting off.  I've removed that setting in 
favor of passing zero as ppem.  That allows hinting in one direction and not 
the other.  Unlike old HarfBuzz, we will do metrics hinting ourselves.

The set_scale() API is modeled after FreeType, but otherwise very awkward to 
use.  There are four different spaces relevant in HarfBuzz:

   - Font design space: typically a 1024x1024 box per glyph.  The GPOS and 
'kern' values are in this space.  This maps to the EM space by a per-face 
value called upem (units per em).

   - EM space: 1em = 1em.

   - Device space: actual pixels.  The ppem maps EM space to this space, if 
such a mapping exists.

   - User space: the user expects glyph positions in this space.  This can be 
different from device space (it is, for example if you use cairo_scale()). 
Current/old pango ignore this distinction and hence kerning doesn't scale 
correctly [1].

Now, what the hb_font_set_scale() call accepts right now is a 16.16 pair of 
scales mapping from font design space to device space.  I'm not sure, but 
getting that number from font systems other than FreeType may actually be 
quite hard.  The problem is, upem is an implementation detail of the face, and 
the user shouldn't care about it.

So my proposal is to separate upem and make it a face property.  In fact, we 
can read upem from OS/2 SFNT table and assume a 1024 upem for non-SFNT fonts 
(that's what Type1 did IIRC).  In fact, we wouldn't directly use upem for 
non-SFNT fonts right now.

Then the scale would simply need to map EM space to device space.  But notice 
how that's the same as the ppem.  But then again, we really just care about 
user space for positioning (device space comes in only when hinting).  So, 
set_scale should be changed to accept em-to-user-space scale.  Not 
surprisingly, that's the same as the font size in the user-space.

Another problem I would need to solve here is, cairo allows a full matrix for 
device-to-user space.  That is, glyphs can be rotated in-place for example. 
That's what we use to implement vertical text.  I'm inclined to also adding a 
full-matrix setter.  The behavior would be:

   - If (1,0) maps to (x,y) with nonzero y, then many kinds of positioning 
should be completely disabled,

   - Somehow figure out what to do with vertical.  Not sure right now, but it 
should be ok detecting if the font is 90-degree rotated and compensate for that.

In that model however, I wonder how easy/hard would it be for callbacks to 
provide requested values (contour point, glyph metrics, etc) in the user 
space.  For cairo/pango I know that's actually the easiest thing to do, 
anything else would need conversion, but I'm not sure about other systems.  An 
alternative would be to let the callbacks choose which space the returned 
value is in, so we can map appropriately.

I guess that's it for now.  Let discussion begin.  Thanks for reading!

behdad

[1] http://bugzilla.gnome.org/show_bug.cgi?id=341481