[HarfBuzz] Fwd: harfbuzz work

Jonathan Kew jonathan at jfkew.plus.com
Wed Jul 15 10:30:00 PDT 2009


This was originally written to Behdad, but copying the HB mailing list  
as it may be of interest to others. Feedback welcome. :)

JK

Begin forwarded message:

> From: Jonathan Kew <jonathan at jfkew.plus.com>
> Date: 24 June 2009 19:18:33 BST
> To: Behdad Esfahbod <behdad at behdad.org>
> Subject: harfbuzz work
>
> Hi Behdad,
>
> FYI, I'm attaching some experimentation I've been doing with  
> HarfBuzz. This is based on your harfbuzz-ng from *before* the most  
> recent commit ("XX") to that branch, as it appeared to be in a  
> somewhat broken (or should I say partially-updated) state there.
>
> The zip file contains new stuff I've been writing, working towards a  
> HarfBuzz-based module we could use in Gecko, without relying on  
> anything else in Pango. There are also a few modifications to your  
> code in pango/opentype, attached as a separate diff file.
>
> What I've done here - some of which you may want to take into  
> HarfBuzz itself, unless you already have better solutions:
>
> * Alternate layout constructor taking pointers to the OpenType  
> tables; I'm using this on OS X at the moment as it's the most  
> convenient way to provide the font data. We won't always have an  
> actual file available for the mmap() approach, though of course  
> that's ideal when we can use it.
>
> * In hb-buffer, made hb_buffer_ensure() public as it could be useful  
> for client code to preallocate space, if it knows how much text is  
> coming; also gave hb_buffer_new() a size parameter so that the  
> caller can ask for an initial allocation size.
>
> * More importantly, I think hb_buffer_ensure() had a bug in the case  
> where out_string == in_string; it was realloc'ing in_string before  
> checking whether the pointers were the same, which means the  
> in_string pointer is likely to have been changed and the wrong  
> branch will be chosen. I think this is fixed correctly in the  
> attached patch.
>
> * Provided a small HB-friendly cmap-reader (currently handles  
> formats 4 and 12 only).
>
> * A script-run itemizer based on ICU's, but adapted to support text  
> in any of UTF-8, 16, or 32 (not actually tested with them all yet,  
> though).
>
> * Code to look up the Unicode character properties we're likely to  
> need; currently script, bidi direction, and arabic joining type.  
> This can be retrieved from the ICU property APIs, if the client is  
> using ICU anyway, or there's a local implementation supporting just  
> the properties needed in the layout process. Actually, as we don't  
> do bidi within HarfBuzz, I'm not sure we need that property; on the  
> other hand, we may need character types (combining marks, etc) for  
> cluster handling - I haven't looked into that yet.
>
> * Proposed shaping-function API (see hb-shaper.h) and two shaper  
> implementations (generic and arabic/syriac/n'ko). These support user- 
> specified features in addition to the defaults and script-specific  
> shaping features. Oh, they also handle mirroring using the OMPL  
> table, and apply ltra/rtla etc according to direction.
>
> In the shaper API that I'm using right now, the approach is to  
> initially fill the buffer with *character* codes, and the shaper  
> function takes a pointer to a cmap table in addition to the layout  
> record. I did this because shaping needs access to the Unicode  
> values, not just the glyphs. I suppose we could specify that the  
> cmap table can be NULL, in which case the buffer is assumed to  
> contain glyph IDs already, but this will make most complex-script  
> shaping impossible. (Actually, it's a problem even for the generic  
> shaper, as it needs the Unicode character codes for mirroring.)
>
> Assuming we use this model of making the shaper be responsible for  
> mapping Unicode to glyphs, should the cmap table be incorporated  
> into the layout record just like GDEF/GSUB/GPOS? I did it separately  
> for now just to minimize disruption to your opentype files, but  
> there's not much reason to keep it separate IMO.
>
> One outstanding issue is passing parameters to features like  
> 'aalt' (alternate substitution lookups). I see you have a  
> "placeholder" for a callback function in  
> AlternateSubstFormat1::apply, but this doesn't look quite sufficient  
> AFAICT. In order to return the proper index, the function would need  
> to know which feature is currently being processed, which is  
> information that is not available at this level of applying the  
> lookup. (Note that it would be possible for a run of text to have  
> several Alternate features applied, with different indexes used for  
> each of them.)
>
> I'm wondering whether it would be feasible to use the "mask"  
> parameter to hb_ot_layout_{substitute,position}_lookup to help here.  
> This is used to selectively switch lookups off for certain glyphs in  
> the buffer, in order to implement things like Arabic shaping, but if  
> we could assume that the shapers should never need more than 24 bits  
> for this purpose (will a shaper ever need individual control of 24  
> distinct features or sets of features?), then we could also use the  
> low byte of the mask to pass a "feature argument" through to the  
> lookups. Currently, the mask is not passed all the way down to the  
> individual subtable apply() functions, so this would need to be  
> done, but I don't think that would be hard, and it would allow a  
> specific alternate index associated with a feature to be passed on  
> to that feature's lookup(s) and used to choose the right alternate.  
> What do you think - should I give this a try and see how it works in  
> practice?
>
> Regards,
>
> Jonathan
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: harfbuzz-changes.diff
Type: application/octet-stream
Size: 6171 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20090715/59732370/attachment.obj>
-------------- next part --------------
>  
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hb_test.zip
Type: application/zip
Size: 59688 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20090715/59732370/attachment.zip>
-------------- next part --------------



More information about the HarfBuzz mailing list