[HarfBuzz] Fwd: harfbuzz work

Mon Aug 3 20:16:46 PDT 2009

Dear Jonathan,

> > * A script-run itemizer based on ICU's, but adapted to support text  
> > in any of UTF-8, 16, or 32 (not actually tested with them all yet,  
> > though).

I have a few struggles with script itemization. My primary struggle is the length of time it takes for a block allocation to get from an ISO meeting (or even earlier) into a release of harfbuzz or whatever application is using it. I'm not sure what can be done about that, but perhaps a solution to my other stuggle might help.

PUA characters are currently defined, very sensibly, as unknown script. But they can turn up in all sorts of places, for example as arabic characters. I am assuming we don't want to return unknown script if we can help it, and therefore wonder if the unknown script code were to be changed to be < SCRIPT_INHERITED that it might not resolve both many PUA issues and also issues of new character allocations within a block or even new block allocations.

An alternative is to have special handling for unknown characters: unknown characters inherit the script of the block they are in.

Knowledge of the font (if known) can help in itemization too. Odds are, if an unknown character is in the same font then it is in the same script. But perhaps you have already done font based run breaking before the itemization occurs here.

In the case of bidi for PUA, I would suggest that PUA (and therefore unknown) characters be given a neutral bidi property rather than the default L that the UTC throws in there.

Yours,
Martin