[HarfBuzz] HarfBuzz 1.0 API; the message you were hoping would never come

Jonathan Kew jfkthame at googlemail.com
Mon Dec 30 04:26:08 PST 2013


Some comments on a few of the items from your list:


> API ITEM: unsigned int vs uintptr_t / size_t:
>
> Use uintptr_t / size_t instead of unsigned int throughout the API?  Which one?
>   This has ABI implications on 64-bit architectures.  My current thinking is
> that we should do this.

Agree (although people shouldn't normally be attempting to shape vast 
buffers that it matters!) Buffer length, etc., should be size_t.

>  I'm unsure to what extent to do this though.  Should
> the, eg, "number of lookups" type change?  I'm leaning towards no.

Agree, no reason to change this. I doubt there's room within the 
OpenType spec to have more than 2^31 lookups, is there?



> API ITEM: Glyph Variants:
>
> get_glyph() currently takes unicode codepoint as well as variation selector.
> The current semantics is that if variation selector is not 0, you are supposed
> to load the correct glyph for the variation selector, and return FALSE if that
> fails, at which point we call get_glyph() again with variation selector set to
> 0.  It has been suggested that we move to two separate callbacks: get_glyph()
> and get_glyph_variant().  It would be slightly faster to do so, but that would
> spread the get_glyph logic into two callbacks instead of one, which would be
> more error-prone in implementations.  So I'm not sure if it's worth it.
> Client code update is small.

I like the separate get_glyph() and get_glyph_variant() callbacks mostly 
because I think it's a clearer API; the current semantics are slightly 
obscure and too easy to get wrong.

(IMO, the "obvious" implementation of a bool get_glyph(unicode, 
varselector) function is to return the exact variant if available, and 
the default glyph for the unicode value otherwise, and only return false 
if the primary unicode char is not supported at all.)


>
> API ITEM: Compatibility type in decomposition:
>
> This one is a new one I thought about.  James recently brought up the fact
> that the new automatic-fractions feature doesn't work nicely with
> compatibility decompositions of VULGAR FRACTIONS.  At the root of the issue is
> that those characters have a compatibility decomposition type of <fraction>.
> We currently ignore the compatibility type in decompositions.  Should we add a
> compat_type enum to decompose_compatibility callback?  I understand that many
> clients don't like that callback to begin with, and most providers of it don't
> have the type data currently.  That said, this comes also handy in Arabic
> compatibility decomposition characters, so we can wrap the <initial> /
> <medial> / <isolated> / <final> decompositions with correct ZWJ/ZWNJ pairs.
> Definitely not high priority, but something to consider.  We can definitely
> add DECOMPOSITION_TYPE_UNKNOWN...  Client code update is trivial if not adding
> support for new functionality.
>

I'm not convinced that attempting to render via compatibility 
decompositions really belongs in harfbuzz at all. ISTM that it might be 
better for this to be cleanly separated into a higher-level library like 
Pango or handled by other client code.

Compatibility decomps come in many kinds, and some of them clearly 
cannot be "properly" rendered by simply using the decomposition mapping; 
they also require some kind of additional styling (scaling, positioning, 
font change, etc) to be applied if the apparent meaning of the text is 
not to be corrupted/lost. Many of these effects cannot be handled within 
harfbuzz alone.

There's overlap here with the process of font-matching (choosing the 
font(s) to be used for a given text sequence), which is clearly out of 
scope for harfbuzz. If a given Unicode character is not supported 
(exactly, or via a *canonical* [de]composition) by a given font, there 
are several possible outcomes: just render the font's .notdef glyph; 
render some synthetic representation of the codepoint (hexbox); render a 
compatibility-equivalent character/sequence, if such exists; choose a 
different font.

Rendering the compatibility equivalent may be a good choice for -some- 
characters in the context of -some- clients, but deciding when to do so, 
and how to handle any added styling that may be required, belongs to a 
higher level than the harfbuzz shaping library, IMO.


> API ITEM: hb_feature_t breakdown:
>
> This was discussed a couple month ago.  Currently hb_feature_t is defined to:
>
> typedef struct hb_feature_t {
>    hb_tag_t      tag;
>    uint32_t      value;
>    unsigned int  start;
>    unsigned int  end;
> } hb_feature_t;
>
> Ideally I like to break that down to:
>
> typedef struct hb_feature_t {
>    hb_tag_t      tag;
>    uint32_t      value;
> } hb_feature_t;
>
> typedef struct hb_range_t {
>    unsigned int  start;
>    unsigned int  end;
> } hb_range_t;
>
> And either define hb_feature_range_t that has a hb_feature_t and a hb_range_t
> inside, or change hb_shape() (and variants) to take an array of hb_feature_t
> as well as hb_range_t.  Both approaches have their benefits, though I'm more
> interested to know whether such a big change is considered possible or too
> much of a change.  Updating client code is trivial for the most part,
> especially with hb_feature_range_t, though, harder to #ifdef.

I'd quite like clients that only use "global" features (i.e., they 
always call harfbuzz with text runs that have uniform styling) to be 
able to avoid creating ranges at all. (This would currently apply to 
both firefox and xetex, I believe, though I have some thoughts about 
changing it in FF some day.)

I think my preferred way to do this would be to pass the features and 
ranges as two separate arrays, and define that if the ranges param is 
NULL, then all features are applied to the entire buffer.

Client code update for clients that only pass global features would be a 
trivial simplification (just remove the start/end values, and pass NULL 
for ranges); for any clients that -do- support per-feature ranges, it'd 
be a bit more churn (managing two parallel arrays instead of one), but 
nothing difficult.



That's all for right now; will try to give thought to some of the other 
items as well.

JK



More information about the HarfBuzz mailing list