[HarfBuzz] harfbuzz review
jonathan at jfkew.plus.com
Fri Apr 16 05:59:39 PDT 2010
> No, I'm not suggesting a different serialization format, I'm saying
> that instead of caching the full GSUB/GPOS tables it might be better
> to cache some data structure that's been distilled for a given purpose.
> For a lot of documents, hb_shape will be called repeatedly with the
> same font/script/lang combination so it would make sense to cache the
> lookups for that font/script/lang combination rather than
> reconstructing it on each call to hb_shape. Basically the lookup
> arrays produced by calling setup_lookups in _hb_ot_substitute_complex
> and _hb_ot_position_complex.
To be worth caching, that would actually be a font/script/lang/features combination. Perhaps there could be a shaping API using a new object representing a "featured font", which encapsulates an hb_face_t and hb_font_t, along with a specific set of script/lang/features. This object could be prepared by a separate harfbuzz API that would build the necessary lookup arrays, and retained by the client for re-use in lots of shaping operations using the same settings.
Note that such a shaping API would NOT allow custom features to be set on various subranges of the buffer, like the current hb_shape intends (although it's not actually implemented, last I checked).
I don't think that would make any difference to the need (for the client) to cache complete GSUB and GPOS tables; the "hb_featured_font" would not extract and save copies of the various lookups, etc., it would merely record the arrays of lookup indices to be applied. Actually doing shaping would still require the original GSUB and GPOS tables.
>> Other than endianness, this chunk of code also defines int types
>> that have no alignment requirements.
> But table data in TrueType fonts is typically 4-byte aligned. Is
> there a situation where long/short data fields within a TrueType table
> are odd-byte aligned? Or you just want the code to handle the rare
> case where a poorly made font has non-4-byte aligned tables?
"Typically", yes, but unfortunately the spec does NOT actually require proper alignment in all cases (besides the possibility of poorly-made fonts that don't respect the alignment requirements that are in the spec). It does require the tables themselves to be aligned on 4-byte boundaries, IIRC, but it does not require this for every structure within them.
So either the code that accesses fields in the table data has to be independent of alignment, or we have to audit every place where a field is accessed to determine whether there's any possibility (in the spec) of that field being misaligned, and use special-case code or types in those cases, AND we have to check every computation involving an offset, etc., to ensure that the result is indeed aligned as per spec.
Unless/until profiling proves that the "safe" approach of always using alignment-independent accessors is in fact a performance issue (which I doubt), I think that's by far the simplest way to handle this, and the least likely to have lurking bugs that will some day cause a mysterious crash.
More information about the HarfBuzz