[HarfBuzz] HarfBuzz 1.0 API; the message you were hoping would never come

Wed Jan 1 21:04:32 PST 2014

On 13-12-30 08:26 PM, Jonathan Kew wrote:
> Some comments on a few of the items from your list:
> 
>> API ITEM: unsigned int vs uintptr_t / size_t:
>>
>> Use uintptr_t / size_t instead of unsigned int throughout the API?  Which one?
>>   This has ABI implications on 64-bit architectures.  My current thinking is
>> that we should do this.
> 
> Agree (although people shouldn't normally be attempting to shape vast buffers
> that it matters!) Buffer length, etc., should be size_t.

Good.  Will do that.

>>  I'm unsure to what extent to do this though.  Should
>> the, eg, "number of lookups" type change?  I'm leaning towards no.
> 
> Agree, no reason to change this. I doubt there's room within the OpenType spec
> to have more than 2^31 lookups, is there?

No, of course not.  It's limited to 16bit.

>> API ITEM: Glyph Variants:
>>
>> get_glyph() currently takes unicode codepoint as well as variation selector.
>> The current semantics is that if variation selector is not 0, you are supposed
>> to load the correct glyph for the variation selector, and return FALSE if that
>> fails, at which point we call get_glyph() again with variation selector set to
>> 0.  It has been suggested that we move to two separate callbacks: get_glyph()
>> and get_glyph_variant().  It would be slightly faster to do so, but that would
>> spread the get_glyph logic into two callbacks instead of one, which would be
>> more error-prone in implementations.  So I'm not sure if it's worth it.
>> Client code update is small.
> 
> I like the separate get_glyph() and get_glyph_variant() callbacks mostly
> because I think it's a clearer API; the current semantics are slightly obscure
> and too easy to get wrong.

Ok, will do that.

> (IMO, the "obvious" implementation of a bool get_glyph(unicode, varselector)
> function is to return the exact variant if available, and the default glyph
> for the unicode value otherwise, and only return false if the primary unicode
> char is not supported at all.)

Right.  And that's what I had originally.  Until we figured that we want to
pass down the VS to GSUB layer if it wasn't "handled" during cmap.

>> API ITEM: Compatibility type in decomposition:
>>
>> This one is a new one I thought about.  James recently brought up the fact
>> that the new automatic-fractions feature doesn't work nicely with
>> compatibility decompositions of VULGAR FRACTIONS.  At the root of the issue is
>> that those characters have a compatibility decomposition type of <fraction>.
>> We currently ignore the compatibility type in decompositions.  Should we add a
>> compat_type enum to decompose_compatibility callback?  I understand that many
>> clients don't like that callback to begin with, and most providers of it don't
>> have the type data currently.  That said, this comes also handy in Arabic
>> compatibility decomposition characters, so we can wrap the <initial> /
>> <medial> / <isolated> / <final> decompositions with correct ZWJ/ZWNJ pairs.
>> Definitely not high priority, but something to consider.  We can definitely
>> add DECOMPOSITION_TYPE_UNKNOWN...  Client code update is trivial if not adding
>> support for new functionality.
> 
> I'm not convinced that attempting to render via compatibility decompositions
> really belongs in harfbuzz at all. ISTM that it might be better for this to be
> cleanly separated into a higher-level library like Pango or handled by other
> client code.
> 
> Compatibility decomps come in many kinds, and some of them clearly cannot be
> "properly" rendered by simply using the decomposition mapping; they also
> require some kind of additional styling (scaling, positioning, font change,
> etc) to be applied if the apparent meaning of the text is not to be
> corrupted/lost. Many of these effects cannot be handled within harfbuzz alone.
> 
> There's overlap here with the process of font-matching (choosing the font(s)
> to be used for a given text sequence), which is clearly out of scope for
> harfbuzz. If a given Unicode character is not supported (exactly, or via a
> *canonical* [de]composition) by a given font, there are several possible
> outcomes: just render the font's .notdef glyph; render some synthetic
> representation of the codepoint (hexbox); render a compatibility-equivalent
> character/sequence, if such exists; choose a different font.
> 
> Rendering the compatibility equivalent may be a good choice for -some-
> characters in the context of -some- clients, but deciding when to do so, and
> how to handle any added styling that may be required, belongs to a higher
> level than the harfbuzz shaping library, IMO.

Ok.  There's definitely some resistance to having this enabled by default in
HarfBuzz.  That said, I think pushing them upper in the stack is not ideal
either.  So I'm going to suggest we add a shape flag for these.

Which, reminds me that when we added buffer flags for BOT/EOT, we also tagged
preserve-default-ignorables along those, but that one isn't really a buffer
flag.  It's a shape flag.  So I like to add a new hb_shape_flags_t, move
preserve-default-ignorables there, and add compatibility decomposition there,
perhaps some other high level shaping options?  Disable fallback shaping /
positioning for example?  Can be useful for testing fonts if nothing else.

>> API ITEM: hb_feature_t breakdown:
>>
>> This was discussed a couple month ago.  Currently hb_feature_t is defined to:
>>
>> typedef struct hb_feature_t {
>>    hb_tag_t      tag;
>>    uint32_t      value;
>>    unsigned int  start;
>>    unsigned int  end;
>> } hb_feature_t;
>>
>> Ideally I like to break that down to:
>>
>> typedef struct hb_feature_t {
>>    hb_tag_t      tag;
>>    uint32_t      value;
>> } hb_feature_t;
>>
>> typedef struct hb_range_t {
>>    unsigned int  start;
>>    unsigned int  end;
>> } hb_range_t;
>>
>> And either define hb_feature_range_t that has a hb_feature_t and a hb_range_t
>> inside, or change hb_shape() (and variants) to take an array of hb_feature_t
>> as well as hb_range_t.  Both approaches have their benefits, though I'm more
>> interested to know whether such a big change is considered possible or too
>> much of a change.  Updating client code is trivial for the most part,
>> especially with hb_feature_range_t, though, harder to #ifdef.
> 
> I'd quite like clients that only use "global" features (i.e., they always call
> harfbuzz with text runs that have uniform styling) to be able to avoid
> creating ranges at all. (This would currently apply to both firefox and xetex,
> I believe, though I have some thoughts about changing it in FF some day.)

Currently true, but no client really should do that in an ideal world...

> I think my preferred way to do this would be to pass the features and ranges
> as two separate arrays, and define that if the ranges param is NULL, then all
> features are applied to the entire buffer.
> 
> Client code update for clients that only pass global features would be a
> trivial simplification (just remove the start/end values, and pass NULL for
> ranges); for any clients that -do- support per-feature ranges, it'd be a bit
> more churn (managing two parallel arrays instead of one), but nothing difficult.

I like that.  It's a bit painful for language bindings as we have two arrays
and the same length argument, but that's not a big deal.

> That's all for right now; will try to give thought to some of the other items
> as well.

Thanks.  I'll start a branch soon.  I really appreciate your feedback on all
the items though, even if it's a simple "SGTM".  And others' feedback if they
have any strong feelings about particular items.

-- 
behdad
http://behdad.org/