[HarfBuzz] harfbuzz review

Behdad Esfahbod behdad at behdad.org
Wed Apr 21 02:01:09 PDT 2010


Ok, based on your feedback and some help from Ehsan, I've pushed out a bunch
of fixes and cleanup to master now.  Some macros are replaced by templates.
Notably, the int type definitions.  More below.

On 04/16/2010 02:19 AM, John Daggett wrote:

> Hmm, if by decoupling you mean that it's possible to provide the
> shaping routines with pre-sanitized font tables or some other
> sanitized intermediate form, that's great. Then apps can use a
> harfbuzz utility routine to sanitize font tables when necessary.

Yes, that's what I meant.


> For a lot of documents, hb_shape will be called repeatedly with the
> same font/script/lang combination so it would make sense to cache the
> lookups for that font/script/lang combination rather than
> reconstructing it on each call to hb_shape.  Basically the lookup
> arrays produced by calling setup_lookups in _hb_ot_substitute_complex
> and _hb_ot_position_complex.

I see what you mean.  That's also planned.  In Pango-speak, we call that a
ruleset. I'll either internally cache a few rulesets, or let user manage them
directly.  At any rate, as Jonathan pointed out, we still need the full table
data to use the ruleset and perform the actual shaping.  It just make things
"faster".


> I'm sure you're right, converting from RFC-3066 / BCP-49 language tags
> is more expensive in comparison.  But by using strings to identify
> language, script and feature tags you're forcing the conversion from
> these strings to OpenType tags to occur *every* time a buffer is
> passed to hb_shape.  For a given document using a single language that
> conversion really only needs to happen once for all buffers passed to
> hb_shape.  Ditto for script and feature tags.

The ruleset will also take care of that.


> I agree, using structs for endian swaps is a good thing but you're
> also "sanitizing" int's there!  Unfolding all the macros there it
> looks like it's checking to see that the int field is within a given
> bound; couldn't this be inferred with a bounds check on the containing
> struct?

You unfolded the macros correctly.  However, we rarely actually call that
inline-sanitize function.  There are two reasons it exists though:


1) Sometimes it's handy to call it directly on the int.  For example, in the
following pattern:

struct Anchor
{
  inline bool sanitize (SANITIZE_ARG_DEF) {
    TRACE_SANITIZE ();
    if (!SANITIZE (u.format)) return false;
    switch (u.format) {
    case 1: return u.format1->sanitize (SANITIZE_ARG);
    case 2: return u.format2->sanitize (SANITIZE_ARG);
    case 3: return u.format3->sanitize (SANITIZE_ARG);
    default:return true;
    }
  }

  private:
  union {
  USHORT                format;         /* Format identifier */
  AnchorFormat1         format1[VAR];
  AnchorFormat2         format2[VAR];
  AnchorFormat3         format3[VAR];
  } u;
};


2) We use the mere existence of a sanitize() method with no extra parameters
to verify that the type being sanitized does not reference other structs by
offset.  This is used in the implementation of the Array template:

  inline bool sanitize (SANITIZE_ARG_DEF) {
    TRACE_SANITIZE ();
    if (!SANITIZE_GET_SIZE()) return false;
    /* Note: for structs that do not reference other structs,
     * we do not need to call their sanitize() as we already did
     * a bound check on the aggregate array size, hence the return.
     */
    return true;
    /* We do keep this code though to make sure the structs pointed
     * to do have a simple sanitize(), ie. they do not reference
     * other structs. */
    unsigned int count = len;
    for (unsigned int i = 0; i < count; i++)
      if (!SANITIZE (array()[i]))
        return false;
    return true;
  }


>> Other than endianness, this chunk of code also defines int types
>> that have no alignment requirements.
> 
> But table data in TrueType fonts is typically 4-byte aligned.  Is
> there a situation where long/short data fields within a TrueType table
> are odd-byte aligned?  Or you just want the code to handle the rare
> case where a poorly made font has non-4-byte aligned tables?

Again, as Jonathan pointed out, we can't rely on any alignment.  And no, the
spec doesn't require any such alignment either.


Cheers,

behdad

> Cheers,
> 
> John



More information about the HarfBuzz mailing list