[HarfBuzz] HarfBuzz 1.0 API; the message you were hoping would never come
Behdad Esfahbod
behdad at behdad.org
Wed Jan 1 21:04:32 PST 2014
On 13-12-30 08:26 PM, Jonathan Kew wrote:
> Some comments on a few of the items from your list:
>
>> API ITEM: unsigned int vs uintptr_t / size_t:
>>
>> Use uintptr_t / size_t instead of unsigned int throughout the API? Which one?
>> This has ABI implications on 64-bit architectures. My current thinking is
>> that we should do this.
>
> Agree (although people shouldn't normally be attempting to shape vast buffers
> that it matters!) Buffer length, etc., should be size_t.
Good. Will do that.
>> I'm unsure to what extent to do this though. Should
>> the, eg, "number of lookups" type change? I'm leaning towards no.
>
> Agree, no reason to change this. I doubt there's room within the OpenType spec
> to have more than 2^31 lookups, is there?
No, of course not. It's limited to 16bit.
>> API ITEM: Glyph Variants:
>>
>> get_glyph() currently takes unicode codepoint as well as variation selector.
>> The current semantics is that if variation selector is not 0, you are supposed
>> to load the correct glyph for the variation selector, and return FALSE if that
>> fails, at which point we call get_glyph() again with variation selector set to
>> 0. It has been suggested that we move to two separate callbacks: get_glyph()
>> and get_glyph_variant(). It would be slightly faster to do so, but that would
>> spread the get_glyph logic into two callbacks instead of one, which would be
>> more error-prone in implementations. So I'm not sure if it's worth it.
>> Client code update is small.
>
> I like the separate get_glyph() and get_glyph_variant() callbacks mostly
> because I think it's a clearer API; the current semantics are slightly obscure
> and too easy to get wrong.
Ok, will do that.
> (IMO, the "obvious" implementation of a bool get_glyph(unicode, varselector)
> function is to return the exact variant if available, and the default glyph
> for the unicode value otherwise, and only return false if the primary unicode
> char is not supported at all.)
Right. And that's what I had originally. Until we figured that we want to
pass down the VS to GSUB layer if it wasn't "handled" during cmap.
>> API ITEM: Compatibility type in decomposition:
>>
>> This one is a new one I thought about. James recently brought up the fact
>> that the new automatic-fractions feature doesn't work nicely with
>> compatibility decompositions of VULGAR FRACTIONS. At the root of the issue is
>> that those characters have a compatibility decomposition type of <fraction>.
>> We currently ignore the compatibility type in decompositions. Should we add a
>> compat_type enum to decompose_compatibility callback? I understand that many
>> clients don't like that callback to begin with, and most providers of it don't
>> have the type data currently. That said, this comes also handy in Arabic
>> compatibility decomposition characters, so we can wrap the <initial> /
>> <medial> / <isolated> / <final> decompositions with correct ZWJ/ZWNJ pairs.
>> Definitely not high priority, but something to consider. We can definitely
>> add DECOMPOSITION_TYPE_UNKNOWN... Client code update is trivial if not adding
>> support for new functionality.
>
> I'm not convinced that attempting to render via compatibility decompositions
> really belongs in harfbuzz at all. ISTM that it might be better for this to be
> cleanly separated into a higher-level library like Pango or handled by other
> client code.
>
> Compatibility decomps come in many kinds, and some of them clearly cannot be
> "properly" rendered by simply using the decomposition mapping; they also
> require some kind of additional styling (scaling, positioning, font change,
> etc) to be applied if the apparent meaning of the text is not to be
> corrupted/lost. Many of these effects cannot be handled within harfbuzz alone.
>
> There's overlap here with the process of font-matching (choosing the font(s)
> to be used for a given text sequence), which is clearly out of scope for
> harfbuzz. If a given Unicode character is not supported (exactly, or via a
> *canonical* [de]composition) by a given font, there are several possible
> outcomes: just render the font's .notdef glyph; render some synthetic
> representation of the codepoint (hexbox); render a compatibility-equivalent
> character/sequence, if such exists; choose a different font.
>
> Rendering the compatibility equivalent may be a good choice for -some-
> characters in the context of -some- clients, but deciding when to do so, and
> how to handle any added styling that may be required, belongs to a higher
> level than the harfbuzz shaping library, IMO.
Ok. There's definitely some resistance to having this enabled by default in
HarfBuzz. That said, I think pushing them upper in the stack is not ideal
either. So I'm going to suggest we add a shape flag for these.
Which, reminds me that when we added buffer flags for BOT/EOT, we also tagged
preserve-default-ignorables along those, but that one isn't really a buffer
flag. It's a shape flag. So I like to add a new hb_shape_flags_t, move
preserve-default-ignorables there, and add compatibility decomposition there,
perhaps some other high level shaping options? Disable fallback shaping /
positioning for example? Can be useful for testing fonts if nothing else.
>> API ITEM: hb_feature_t breakdown:
>>
>> This was discussed a couple month ago. Currently hb_feature_t is defined to:
>>
>> typedef struct hb_feature_t {
>> hb_tag_t tag;
>> uint32_t value;
>> unsigned int start;
>> unsigned int end;
>> } hb_feature_t;
>>
>> Ideally I like to break that down to:
>>
>> typedef struct hb_feature_t {
>> hb_tag_t tag;
>> uint32_t value;
>> } hb_feature_t;
>>
>> typedef struct hb_range_t {
>> unsigned int start;
>> unsigned int end;
>> } hb_range_t;
>>
>> And either define hb_feature_range_t that has a hb_feature_t and a hb_range_t
>> inside, or change hb_shape() (and variants) to take an array of hb_feature_t
>> as well as hb_range_t. Both approaches have their benefits, though I'm more
>> interested to know whether such a big change is considered possible or too
>> much of a change. Updating client code is trivial for the most part,
>> especially with hb_feature_range_t, though, harder to #ifdef.
>
> I'd quite like clients that only use "global" features (i.e., they always call
> harfbuzz with text runs that have uniform styling) to be able to avoid
> creating ranges at all. (This would currently apply to both firefox and xetex,
> I believe, though I have some thoughts about changing it in FF some day.)
Currently true, but no client really should do that in an ideal world...
> I think my preferred way to do this would be to pass the features and ranges
> as two separate arrays, and define that if the ranges param is NULL, then all
> features are applied to the entire buffer.
>
> Client code update for clients that only pass global features would be a
> trivial simplification (just remove the start/end values, and pass NULL for
> ranges); for any clients that -do- support per-feature ranges, it'd be a bit
> more churn (managing two parallel arrays instead of one), but nothing difficult.
I like that. It's a bit painful for language bindings as we have two arrays
and the same length argument, but that's not a big deal.
> That's all for right now; will try to give thought to some of the other items
> as well.
Thanks. I'll start a branch soon. I really appreciate your feedback on all
the items though, even if it's a simple "SGTM". And others' feedback if they
have any strong feelings about particular items.
--
behdad
http://behdad.org/
More information about the HarfBuzz
mailing list