[HarfBuzz] HarfBuzz API design
Behdad Esfahbod
behdad at behdad.org
Tue Aug 18 16:23:50 PDT 2009
[Warning: long email ahead]
Hi all,
With the rewritten HarfBuzz OpenType Layout engine merged in pango master now,
I've been working on the public API for a few weeks. I have it mostly
designed, though there are a few open questions still and I appreciate any
feedback. The current code can be browsed here:
http://git.gnome.org/cgit/pango/tree/pango/opentype
I will add a separate configure.ac to that directory in the coming days such
that it can be built as a standalone library. In a couple of weeks I may even
move it back out to its own git repo and use git magic to pull it in pango
until we start using it as a shared library (expect end of year).
When designing HarfBuzz API my reference has been cairo. That is, usability
has been the top priority. Other than that, hiding technical details while
still being powerful enough to implement advanced features internally are
other goals of the API.
In this mail I'll only discuss the backend-agnostic API, which is what I
expect most users will use. This is what will be available by including
"hb.h". For example, OpenType-specific APIs will be included in "hb-ot.h"
only. That includes querying list of supported OpenType scripts, language
systems, features, etc.
Finally, the other strict goal of the API is to be fully thread-safe. That
means, I had to bit the bullet and add refcounting API already. Object
lifecycle API is like cairo's, that is, each object has: _create(),
_reference(), _destory(), and _get_reference_count(). At some point we may
want to add _[gs]et_user_data() also which is useful for language bindings.
Error handling is also designed somewhat like cairo's, that is, objects keep
track of failure internally (including malloc failures), but unlike cairo,
there's no direct way to query objects for errors. HarfBuzz simply does its
best to get you the result you wanted. In case of errors, the output may be
wrong, but there's nothing you can do to improve it. There's not much point
in reporting that state anyway. So, no error handling in the API.
Before jumping into the API, lemme introdce a memory management construct I
added first:
Blobs
=====
hb_blob_t is a refcounted container for raw data and is introduced to make
memory management between HarfBuzz and the user easy and flexible. Blobs can
be created by:
typedef enum {
HB_MEMORY_MODE_DUPLICATE,
HB_MEMORY_MODE_READONLY,
HB_MEMORY_MODE_WRITEABLE,
HB_MEMORY_MODE_READONLY_NEVER_DUPLICATE,
HB_MEMORY_MODE_READONLY_MAY_MAKE_WRITEABLE,
} hb_memory_mode_t;
typedef struct _hb_blob_t hb_blob_t;
hb_blob_t *
hb_blob_create (const char *data,
unsigned int length,
hb_memory_mode_t mode,
hb_destroy_func_t destroy,
void *user_data);
The various mode parameters mean:
DUPLICATE: copy data right away and own it.
READONLY: the data passed in can be kept for later use, but should not be
modified. If modification is needed, the blob will duplicate the data lazily.
WRITEABLE: data is writeable, use it freely.
READONLY_NEVER_DUPLICATE: data is readonly and should never be duplicated.
This disables operations needing write access to data.
READONLY_MAY_MAKE_WRITEABLE: data is readonly but may be made writeable
using mprotect() or equivalent win32 calls. It's up to the user to make sure
calling mprotect() or system-specific equivalents on the data is safe. In
practice, that's never an issue on Linux and (according to Tor) on win32.
One can also create a sub-blob of a blob:
hb_blob_t *
hb_blob_create_sub_blob (hb_blob_t *parent,
unsigned int offset,
unsigned int length);
Blob's data can be used by locking it:
const char *
hb_blob_lock (hb_blob_t *blob);
One can query whether data is writeable:
hb_bool_t
hb_blob_is_writeable (hb_blob_t *blob);
Can request it to be writeable inplace:
hb_bool_t
hb_blob_try_writeable_inplace (hb_blob_t *blob);
Or can request making data writeable, making a copy if need be:
hb_bool_t
hb_blob_try_writeable (hb_blob_t *blob);
For the latter the blob must not be locked. The lock is recursive. The blob
internal stuff is protected using a mutex and hence the structure is threadsafe.
The main use of the blob is to provide font data or table data to HarfBuzz.
More about that later.
Text API
========
Perhaps the biggest difference between the API of the old Qt-based HarfBuzz
shaper API and the new one is that the new API reuses hb-buffer for its
shaping input+output. So, this is how you will use harfbuzz in three lines:
- create buffer
- add text to buffer ---> buffer contains Unicode text now
- call hb_shape() on it
- use output glyphs ---> buffer contains positioned glyphs now
Within that picture, there are three main objects in HarfBuzz:
- hb_buffer_t: holds text/glyphs, and is not threadsafe
- hb_face_t: represents a single SFNT face, fully threadsafe beyond
construction. Maps to cairo_font_face_t.
- hb_font_t: represents a face at a certain size with certain hinting
options, fully threadsafe beyond construction. Maps to cairo_scaled_font_t.
Buffer
======
The buffer's output is two arrays: glyph infos and glyph positions. Eventually
these two items will look like:
typedef struct _hb_glyph_info_t {
hb_codepoint_t codepoint;
hb_mask_t mask;
uint32_t cluster;
uint16_t component;
uint16_t lig_id;
uint32_t internal;
} hb_glyph_info_t;
typedef struct _hb_glyph_position_t {
hb_position_t x_pos;
hb_position_t y_pos;
hb_position_t x_advance;
hb_position_t y_advance;
uint32_t internal;
} hb_glyph_position_t;
One nice thing about using hb-buffer for input is that we can now easily add
UTF-8, UTF-16, and UTF-32 APIs to HarfBuzz by simply implementing:
void
hb_buffer_add_utf8 (hb_buffer_t *buffer,
const char *text,
unsigned int text_length,
unsigned int item_offset,
unsigned int item_length);
void
hb_buffer_add_utf16 (hb_buffer_t *buffer,
const uint16_t *text,
unsigned int text_length,
unsigned int item_offset,
unsigned int item_length);
void
hb_buffer_add_utf32 (hb_buffer_t *buffer,
const uint32_t *text,
unsigned int text_length,
unsigned int item_offset,
unsigned int item_length);
These add individual Unicode characters to the buffer and set the cluster
values respectively.
Face
====
HarfBuzz is build around the SFNT font format. A Face simply represents a
SFNT face, although this is all transparent to the user: you can pass junk to
HarfBuzz as font data and it will simply ignore it. There are two main face
constructors:
hb_face_t *
hb_face_create_for_data (hb_blob_t *blob,
unsigned int index);
typedef hb_blob_t * (*hb_get_table_func_t) (hb_tag_t tag, void *user_data);
/* calls destory() when not needing user_data anymore */
hb_face_t *
hb_face_create_for_tables (hb_get_table_func_t get_table,
hb_destroy_func_t destroy,
void *user_data);
The for_tables() version uses a callback to load SFNT tables, whereas the
for_data() version takes a blob which contains the font file data, plus the
face index for TTC collections.
The face is only responsible for the "complex" part of the shaping right now,
that is, OpenType Layout features (GSUB/GPOS...). In the future we may also
access cmap directly. Not implemented right now, but old-style 'kern' table
will also be implemented in the same layer.
The reason for introducing the blob machinery is that the new OpenType Layout
engine and any other table work we'll add use the font data directly, instead
of parsing it into separate data structures. For that reason, we need to
"sanitize" the font data first. When sanitizing, instead of pass/fail, upon
finding irregularities (say, an offset that points to out of the table), we
may modify the font data to make it clean-enough to pass to the layout code.
In those cases, we first try to make the blob writeable in place, and if that
fails, to make a writeable dup of it. That is, copy-on-write easy or the hard
way. For sane fonts, this means zero per-process memory is consumed. In the
future, we'll cache sanitize() results in fontconfig such that not every
process has to sanitize() clean fonts.
Font
====
Normally I would have made the font constructor take a hb_face_t (like cairo's
does indeed). A font is a face at a certain size and with certain hinting /
other options afterall. However, FreeType's lack of refcounting makes this
really hard. The reason being: Pango caches hb_face_t on the FT_Face
instance's generic slot. Whereas a hb_font_t should be attached to a
PangoFont or PangoFcFont.
As everyone knows, FT_Face is not threadsafe, is not refcounted, and is not
just a face, but also includes sizing information for one font at a time. For
this reasons, whenever a font wants to access a FT_Face, it needs to "lock"
one. When you lock it though, you don't necessarily get the same object that
you got the last time. It may be a totally different object, created for the
same font data, depending on who manages your FT_Face pool (cairo in our
case). Anyway, for this reason, having hb_font_t have a ref to hb_face_t
makes life hard: one either would have to create/destroy hb_font_t between
FT_Face lock/unlock, or risk having a hb_face_t pointing to memory owned by a
FT_Face that may have been freed since.
For the reasons above I opted for not refing a face from hb_font_t and instead
passing both a face and a font around in the API. Maybe I should use a
different name (hb_font_scale_t?) I'd rather keep names short, instead of
cairo style hb_font_face_t and hb_scaled_font_t.
Anyway, a font is created easily:
hb_font_t *
hb_font_create (void);
One then needs to set various parameters on it, and after the last change, it
can be used from multiple threads safely.
Shaping
=======
The main hb_shape() API I have written down right now (just a stub) is:
typedef struct _hb_feature_t {
const char *name;
const char *value;
unsigned int start;
unsigned int end;
} hb_feature_t;
void
hb_shape (hb_face_t *face,
hb_font_t *font,
hb_buffer_t *buffer,
hb_feature_t *features,
unsigned int num_features);
where features are normally empty, but can be used to pass things like:
"kern"=>"0" -------> no kerning
"ot:aalt"=>"2" -------> use 2nd OpenType glyph alternative
"ot:mkmk"=>"0" -------> never apply 'mkmk' OpenType feature
Perhaps:
"ot:script"=>"math" ------> Force an OpenType script tag
"ot:langsys"=>"FAR " -----> Force an OpenType language system
Maybe:
"ot"=>"0" ------> Disable OpenType engine (prefer AAT, SIL, etc)
Or perhaps even features marking visual edge of the text, etc.
Discussion
==========
Script and language
===================
Normally the shape() call needs a few more pieces of information. Namely:
text direction, script, and language. Note that none of those belong on the
face or font objects. For text direction, I'm convinced that it should be set
on the buffer, and already have that in place.
For script and language, it's a bit more delicate. I'm also convinced that
they belong to the buffer. With script it's fine, but with language it
introduces a small implementation hassle: that I would have to deal with
copying/interning language tags, something I was trying to avoid. The other
options are:
- Extra parameters to hb_shape(). I rather not do this. Keeping details
like this out of the main API and addings setters where appropriate makes the
API cleaner and more extensible.
- Use the feature dict for them too. I'm strictly against this one. The
feature dict is already too highlevel for my taste.
So, comments here is appreciated.
Unicode callbacks
=================
HarfBuzz itself does not include any Unicode character database tables, but
needs access to a few properties, some of them for fallback shaping only.
Currently I have identified the following properties as being useful at some
point:
typedef hb_codepoint_t
(*hb_unicode_get_mirroring_func_t) (hb_codepoint_t unicode);
Needed to implement character-level mirroring.
typedef hb_category_t
(*hb_unicode_get_general_category_func_t) (hb_codepoint_t unicode);
Used for synthesizing GDEF glyph classes when the face doesn't have them.
typedef hb_script_t
(*hb_unicode_get_script_func_t) (hb_codepoint_t unicode);
Not needed unless we also implement script itemization (which we can do
transparently, say, if user passed SCRIPT_COMMON to the shape() function).
typedef unsigned int
(*hb_unicode_get_combining_class_func_t) (hb_codepoint_t unicode);
Useful for all kinds of mark positioning when GPOS is not available.
typedef unsigned int
(*hb_unicode_get_eastasian_width_func_t) (hb_codepoint_t unicode);
Not sure it will be useful in HarfBuzz layer. I recently needed to use it
correctly set text in vertical direction in Pango.
I've added an object called hb_unicode_funcs_t that holds all these callbacks.
It can be ref'ed, as well as copied. There's also a
hb_unicode_funcs_make_immutable() call, useful for libraries who want to give
out references to a hb_unicode_funcs_t object they own but want to make sure
the user doesn't modify the object by mistake.
The hb-glib.h layer then implements:
hb_unicode_funcs_t *
hb_glib_get_unicode_funcs (void);
The question then is where to pass the unicode funcs to the shape() machinery.
My current design has it on the face:
void
hb_face_set_unicode_funcs (hb_face_t *face,
hb_unicode_funcs_t *unicode_funcs);
However, that is quite arbitrary. There is nothing in the face alone that
requires Unicode functionality. Moreover, I want to keep the face very
objective. Say, you should be able to get the hb_face_t from whoever provides
you with one (pango, ...), and use it without worrying about what settings it
has. The Unicode funcs, while welldefined, can still come from a variety of
sources: glib, Qt, Python's, your own experiments, ...
I started thinking about moving that to the buffer instead. That's the only
other place that Unicode comes in (add_utf8/...), and the buffer is the only
object that is not shared by HarfBuzz, so user has full control over it.
One may ask why have the callbacks settable to begin with. We can hardcode
them at build time: if glib is available, use it, otherwise use our own copy
or something. While I may make it to fallback to whatever has been available
at compile time, I like being able to let user set the callbacks. At least
until I write one UCD library to rule them all... /me grins
So that's another question I need feedback about.
Font callbacks
==============
These are the font callbacks (font class, font funcs, ...) that I've
prototyped. Note that both the font, face, and a user_data parameter are
passed to all of them. Some of these callbacks technically just need a face,
not font, but since many systems implement these functions on actual fonts not
faces, we implement it this way. Right now one can set the
hb_font_callbacks_t object on the hb-font and set user_data there
(hb_font_set_funcs()).
typedef hb_codepoint_t
(*hb_font_get_glyph_func_t) (hb_font_t *font, hb_face_t
*face, const void *user_data,
hb_codepoint_t unicode,
hb_codepoint_t variant_selector);
This is the cmap callback. Note the variant_selector: it supports new cmap14
tables. For older clients, they can ignore that argument and do the mapping.
We probably will implement support for Unicode cmaps internally, but chain
to this function for missing-glyphs or if no suitable cmap was found. That
has three advantages:
- Pango etc can pass whatever code they want for missing glyphs, to use
later to draw hexbox,
- Pango, through fontconfig, knows how to handle non-Unicode cmaps, so that
will continue to work,
- For non-SFNT fonts, HarfBuzz should happily sit back and make things work
still, this is how that will work.
typedef hb_bool_t
(*hb_font_get_contour_point_func_t) (hb_font_t *font, hb_face_t *face,
const void *user_data,
hb_codepoint_t glyph,
hb_position_t *x, hb_position_t *y);
Needed for complex GPOS positioning. Pango never did this before. Pretty
straightforward, just need to make it clear the space that the positions are
returned in. I'll discuss that in the next section.
typedef void
(*hb_font_get_glyph_metrics_func_t) (hb_font_t *font, hb_face_t *face, const
void *user_data, hb_codepoint_t glyph,
hb_glyph_metrics_t *metrics);
This one is a bit more tricky. Technically we just need the advance width.
The rest of the metrics are only used for fallback mark positioning. So maybe
I should split this in a get_glyph_advance and a full get_glyph_metrics one.
Current HarfBuzz has a single call to get advance width of multiple glyphs.
If that kind of optimization deems necessary in the future, we can add a
callback to take an entire buffer and set the advances.
There are more issues here though:
1) The metrics struct most probably should be public. However, in the
future I like to use bearing-deltas to improve positioning. A transparent
struct doesn't help in those situations. Not sure what the alternatives are.
2) It's not exactly clear how to deal with vertical fonts. One way would
be to assume that if buffer direction is vertical, then the font already knows
that and returns the vertical metrics. That's not a completely off
assumption, though that may not be how win32 fonts work?
typedef hb_position_t
(*hb_font_get_kerning_func_t) (hb_font_t *font, hb_face_t *face,
const void *user_data,
hb_codepoint_t first_glyph,
hb_codepoint_t second_glyph);
Again, most probably we will read 'kern' table internally anyway, but this can
be used for fallback with non-SFNT fonts. You can even pass, say, SVG fonts
through HarfBuzz such that the higher level just deals with one API.
Another call that may be useful is a get_font_metrics one. Again, only useful
in fallback positioning. In that case, ascent/descent as well as slope come
handy.
Font scale, etc
===============
Currently, based on the old code, the font object has the following setters:
void
hb_font_set_scale (hb_font_t *font,
hb_16dot16_t x_scale,
hb_16dot16_t y_scale);
void
hb_font_set_ppem (hb_font_t *font,
unsigned int x_ppem,
unsigned int y_ppem);
The ppem API is well-defined: that's the ppem to use for hinting and
device-dependent positioning. Old HarfBuzz also had a "device-independent"
setting, which essentially turned hinting off. I've removed that setting in
favor of passing zero as ppem. That allows hinting in one direction and not
the other. Unlike old HarfBuzz, we will do metrics hinting ourselves.
The set_scale() API is modeled after FreeType, but otherwise very awkward to
use. There are four different spaces relevant in HarfBuzz:
- Font design space: typically a 1024x1024 box per glyph. The GPOS and
'kern' values are in this space. This maps to the EM space by a per-face
value called upem (units per em).
- EM space: 1em = 1em.
- Device space: actual pixels. The ppem maps EM space to this space, if
such a mapping exists.
- User space: the user expects glyph positions in this space. This can be
different from device space (it is, for example if you use cairo_scale()).
Current/old pango ignore this distinction and hence kerning doesn't scale
correctly [1].
Now, what the hb_font_set_scale() call accepts right now is a 16.16 pair of
scales mapping from font design space to device space. I'm not sure, but
getting that number from font systems other than FreeType may actually be
quite hard. The problem is, upem is an implementation detail of the face, and
the user shouldn't care about it.
So my proposal is to separate upem and make it a face property. In fact, we
can read upem from OS/2 SFNT table and assume a 1024 upem for non-SFNT fonts
(that's what Type1 did IIRC). In fact, we wouldn't directly use upem for
non-SFNT fonts right now.
Then the scale would simply need to map EM space to device space. But notice
how that's the same as the ppem. But then again, we really just care about
user space for positioning (device space comes in only when hinting). So,
set_scale should be changed to accept em-to-user-space scale. Not
surprisingly, that's the same as the font size in the user-space.
Another problem I would need to solve here is, cairo allows a full matrix for
device-to-user space. That is, glyphs can be rotated in-place for example.
That's what we use to implement vertical text. I'm inclined to also adding a
full-matrix setter. The behavior would be:
- If (1,0) maps to (x,y) with nonzero y, then many kinds of positioning
should be completely disabled,
- Somehow figure out what to do with vertical. Not sure right now, but it
should be ok detecting if the font is 90-degree rotated and compensate for that.
In that model however, I wonder how easy/hard would it be for callbacks to
provide requested values (contour point, glyph metrics, etc) in the user
space. For cairo/pango I know that's actually the easiest thing to do,
anything else would need conversion, but I'm not sure about other systems. An
alternative would be to let the callbacks choose which space the returned
value is in, so we can map appropriately.
I guess that's it for now. Let discussion begin. Thanks for reading!
behdad
[1] http://bugzilla.gnome.org/show_bug.cgi?id=341481
More information about the HarfBuzz
mailing list