[HarfBuzz] Fwd: harfbuzz work

Thu Aug 6 12:25:00 PDT 2009

On 5 Aug 2009, at 22:27, Behdad Esfahbod wrote:

> On 08/03/2009 11:16 PM, Martin Hosken wrote:
>> Dear Jonathan,
>>
>>>> * A script-run itemizer based on ICU's, but adapted to support text
>>>> in any of UTF-8, 16, or 32 (not actually tested with them all yet,
>>>> though).
>
> Hi Martin,
>
>> I have a few struggles with script itemization. My primary struggle  
>> is the length of time it takes for a block allocation to get from  
>> an ISO meeting (or even earlier) into a release of harfbuzz or  
>> whatever application is using it. I'm not sure what can be done  
>> about that, but perhaps a solution to my other stuggle might help.
>
> As of now, I'm not quite sure whether we want to have script  
> itemization in harfbuzz to begin with.  *If* we do, however, it will  
> use a callback to get the script for a character, so higher level  
> can control what script is returned for unencoded characters.

I don't think I mind whether script itemization is labelled as being  
part of harfbuzz or not; either way, I need to do it before calling  
the harfbuzz shapers. A callback may be nice, though it's also an  
extra cost compared to a direct table lookup. I'm not totally  
convinced of the value of it yet, at least in our current use case.

>> Knowledge of the font (if known) can help in itemization too. Odds  
>> are, if an unknown character is in the same font then it is in the  
>> same script. But perhaps you have already done font based run  
>> breaking before the itemization occurs here.
>
> Script itemization is always done before font selection.

That may be true in Pango, but it's not (currently, at least) true in  
Gecko: we do font selection, based on the CSS, user preferences,  
system fallbacks, etc., first, and script itemization happens at the  
last moment, just before shaping. I suppose that may change, though; I  
can think of potential advantages to doing it earlier.

Jonathan