[HarfBuzz] hb-shape to output applied features (optionally also an ultra-verbose XML log)?

Adam Twardoch (List) list.adam at twardoch.com
Tue Aug 28 03:55:42 PDT 2012


Behdad,

when testing shaping, it would be extremely useful if HarfBuzz (and
hb-shape) could also output the exact list of features applied in the
exact order they were applied. 

What I mean is, when I have some text processed such as "بعض من знает"
and I run something like
$ hb-shape --verbose --font-file="HelveticaWorld-Regular.ttf"
--features="swsh,dlig" --shapers="ot" "بعض من знает"

I get this:
1: (بعض من знает)
1:
<U+0628,U+0639,U+0636,U+0020,U+0645,U+0646,U+0020,U+0437,U+043D,U+0430,U+0435,U+0442>
1:
[uni0442=11+963|uni0435=10+1090|uni0430=9+1125|uni043D=8+1143|uni0437=7+1047|space.Arabic=6+410|
uniFEE6=5+977|uniFEE3=4+885|space.Arabic=3+410|uniFEBE=2+2017|uniFECC=1+869|uniFE91=0+649]

which is nice and useful. However, what I know is that I requested the
features "swsh,dlig" to be applied, but I don't know which features were
additionally applied by the shaper. Depending on the shaper, of course,
different features would be applied, and it would be useful to know
which ones they were, so I could potentially override them at a later
step (i.e. I would *know* what to override).

Since some features are applied in sequentially while others are applied
at once, it would be useful to differentiate them. You could either use
the "|" sign to separate the steps and use "," to do the ones which are
applied at once, or you could use "," to denote order and "+" to denote
at once (the second one seems more intuitive). If you could get the
"Pythonic" notation in there, that would also be wonderful. Only those
which are actually on would be worth reporting.

So a simple report (without Pythonic) for the example cited above might
probably look something like:
ccmp,locl,isol,fina,medi,init,calt+cswh+dlig+liga+mset+swsh,curs,kern+mark+mkmk

But if Pythonic is possible, then you could use Pythonic for the
features specified Pythonically externally, and also for those features
which the shapers apply on their discretion. Then, the notation would
probably be different:
ccmp,locl,-isol,fina[2]+fina[5],medi[1],init[0]+init[4],calt+cswh+dlig+liga+mset+swsh,curs,kern+mark+mkmk

Or, if you modify the Pythonic syntax a bit:
ccmp,locl,-isol,fina[2,5],medi[1],init[0,4],calt+cswh+dlig+liga+mset+swsh,curs,kern+mark+mkmk

I realize that implementing the "Pythonic" for shaper-induced features
may possibly make no sense after all, because once you've done ccmp
etc., you no longer really necessarily know the backwards mapping to the
original codepoints. In such case, the Pythonic output should only be
shown if Pythonic input was used for the externally specified features.

But this ability would be really great and useful.

If you keep a backwards mapping from final glyphs to the original
Unicodes, then I'd even ask for an ultra-verbose output option in XML.
I'm including a handcrafted sample XML document which roughly shows what
I'd like to see. Basically, it's a detailed log of the entire layout
process using HarfBuzz, with some nods towards human-readability. Having
such a tool would be indispensable for font developers who could debug
their fonts in minute detail. In my hand-made document, I haven't really
given space for text runs, but I guess they should be somewhere in the
structure.

Basically, what *some* people need is *really verbose* output. By
itself, it may not be superbly effective or readable (because it'll be
very long), but if you develop and fine-tune your features, this kind of
document could be used to do comprehensive diffs as to what changed in
the font.

Best regards,
Adam




-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb-shape.xml
Type: text/xml
Size: 14666 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20120828/1e298a0a/attachment.xml>


More information about the HarfBuzz mailing list