[HarfBuzz] HarfBuzz 1.0 API; the message you were hoping would never come
Jonathan Kew
jfkthame at googlemail.com
Mon Dec 30 04:26:08 PST 2013
Some comments on a few of the items from your list:
> API ITEM: unsigned int vs uintptr_t / size_t:
>
> Use uintptr_t / size_t instead of unsigned int throughout the API? Which one?
> This has ABI implications on 64-bit architectures. My current thinking is
> that we should do this.
Agree (although people shouldn't normally be attempting to shape vast
buffers that it matters!) Buffer length, etc., should be size_t.
> I'm unsure to what extent to do this though. Should
> the, eg, "number of lookups" type change? I'm leaning towards no.
Agree, no reason to change this. I doubt there's room within the
OpenType spec to have more than 2^31 lookups, is there?
> API ITEM: Glyph Variants:
>
> get_glyph() currently takes unicode codepoint as well as variation selector.
> The current semantics is that if variation selector is not 0, you are supposed
> to load the correct glyph for the variation selector, and return FALSE if that
> fails, at which point we call get_glyph() again with variation selector set to
> 0. It has been suggested that we move to two separate callbacks: get_glyph()
> and get_glyph_variant(). It would be slightly faster to do so, but that would
> spread the get_glyph logic into two callbacks instead of one, which would be
> more error-prone in implementations. So I'm not sure if it's worth it.
> Client code update is small.
I like the separate get_glyph() and get_glyph_variant() callbacks mostly
because I think it's a clearer API; the current semantics are slightly
obscure and too easy to get wrong.
(IMO, the "obvious" implementation of a bool get_glyph(unicode,
varselector) function is to return the exact variant if available, and
the default glyph for the unicode value otherwise, and only return false
if the primary unicode char is not supported at all.)
>
> API ITEM: Compatibility type in decomposition:
>
> This one is a new one I thought about. James recently brought up the fact
> that the new automatic-fractions feature doesn't work nicely with
> compatibility decompositions of VULGAR FRACTIONS. At the root of the issue is
> that those characters have a compatibility decomposition type of <fraction>.
> We currently ignore the compatibility type in decompositions. Should we add a
> compat_type enum to decompose_compatibility callback? I understand that many
> clients don't like that callback to begin with, and most providers of it don't
> have the type data currently. That said, this comes also handy in Arabic
> compatibility decomposition characters, so we can wrap the <initial> /
> <medial> / <isolated> / <final> decompositions with correct ZWJ/ZWNJ pairs.
> Definitely not high priority, but something to consider. We can definitely
> add DECOMPOSITION_TYPE_UNKNOWN... Client code update is trivial if not adding
> support for new functionality.
>
I'm not convinced that attempting to render via compatibility
decompositions really belongs in harfbuzz at all. ISTM that it might be
better for this to be cleanly separated into a higher-level library like
Pango or handled by other client code.
Compatibility decomps come in many kinds, and some of them clearly
cannot be "properly" rendered by simply using the decomposition mapping;
they also require some kind of additional styling (scaling, positioning,
font change, etc) to be applied if the apparent meaning of the text is
not to be corrupted/lost. Many of these effects cannot be handled within
harfbuzz alone.
There's overlap here with the process of font-matching (choosing the
font(s) to be used for a given text sequence), which is clearly out of
scope for harfbuzz. If a given Unicode character is not supported
(exactly, or via a *canonical* [de]composition) by a given font, there
are several possible outcomes: just render the font's .notdef glyph;
render some synthetic representation of the codepoint (hexbox); render a
compatibility-equivalent character/sequence, if such exists; choose a
different font.
Rendering the compatibility equivalent may be a good choice for -some-
characters in the context of -some- clients, but deciding when to do so,
and how to handle any added styling that may be required, belongs to a
higher level than the harfbuzz shaping library, IMO.
> API ITEM: hb_feature_t breakdown:
>
> This was discussed a couple month ago. Currently hb_feature_t is defined to:
>
> typedef struct hb_feature_t {
> hb_tag_t tag;
> uint32_t value;
> unsigned int start;
> unsigned int end;
> } hb_feature_t;
>
> Ideally I like to break that down to:
>
> typedef struct hb_feature_t {
> hb_tag_t tag;
> uint32_t value;
> } hb_feature_t;
>
> typedef struct hb_range_t {
> unsigned int start;
> unsigned int end;
> } hb_range_t;
>
> And either define hb_feature_range_t that has a hb_feature_t and a hb_range_t
> inside, or change hb_shape() (and variants) to take an array of hb_feature_t
> as well as hb_range_t. Both approaches have their benefits, though I'm more
> interested to know whether such a big change is considered possible or too
> much of a change. Updating client code is trivial for the most part,
> especially with hb_feature_range_t, though, harder to #ifdef.
I'd quite like clients that only use "global" features (i.e., they
always call harfbuzz with text runs that have uniform styling) to be
able to avoid creating ranges at all. (This would currently apply to
both firefox and xetex, I believe, though I have some thoughts about
changing it in FF some day.)
I think my preferred way to do this would be to pass the features and
ranges as two separate arrays, and define that if the ranges param is
NULL, then all features are applied to the entire buffer.
Client code update for clients that only pass global features would be a
trivial simplification (just remove the start/end values, and pass NULL
for ranges); for any clients that -do- support per-feature ranges, it'd
be a bit more churn (managing two parallel arrays instead of one), but
nothing difficult.
That's all for right now; will try to give thought to some of the other
items as well.
JK
More information about the HarfBuzz
mailing list