[HarfBuzz] Why harfbuzz isn't/couldn't/shouldn't provide separate [optional] API for glyph/positioning?

Behdad Esfahbod behdad at behdad.org
Wed Mar 7 08:21:37 UTC 2018


On Sun, Feb 25, 2018 at 10:46 PM, Nikolay Sivov <bunglehead at gmail.com>
wrote:

> On 2/26/2018 5:28 AM, Behdad Esfahbod wrote:
> >
> > Two things stand out:
> >
> >   - There's a lot of duplicate info going into both calls,
> >
> >   - There's also a lot data coming out of the first call just to go
> > directly into the second; namely pCharPropsand pGlyphProps.
> >
> > Those two very strongly suggest that the two calls are part of the same
> > larger operation and rather forcefully separated.
>
> One example of such larger operation is ScriptStringAnalyse(), except
> that it's pre-*OpenType() and thus does not have feature ranges support.
>
> If not to justify but to understand better this separation, does it make
> sense if the idea was to have an ability to change font size? Or toggle
> GPOS features without re-running all deal of reprocessing input text
> buffer, because resulting glyph array won't change anyway at this point.
>

Changing font size initially sounds compelling. I have had that in mind for
HarfBuzz too. But in reality, no system is going to use that. It's hard
enough to keep track of input and shaped glyphstrings already. Many systems
throw that away and reshape as needed.  It's just not worth it.


> DirectWrite call is cleaner in that sense, because of separate size
> argument GetGlyphPlacements() takes, as opposed to just current font in
> HDC (or cache).
>
> ...
>
> >
> > Separating the calls also means that some things, like which OpenType
> > feature applies to what range, needs to be recalculated. Guess that's
> > not a huge deal. The biggest problem with separating the calls in a way
> > that is useful for Wine implementing the Uniscribe API on top is that we
> > have to expose the buffer-internal bit allocations. And we don't want to
> > do that, because that is an implementation detail and changes over time.
>
> Actually I have looked again last year at using hb_buffer for
> DirectWrite in Wine, and after I didn't find any way to fill buffer with
> resulting glyphs as opposed to text, I realized that it won't be easy if
> possible at all.
>

It definitely *is* possible to split hb_shape() call into two. There's some
minor complexities, those can be resolved. But channeling the entirety of
hb_glyph_info_t through the Uniscribe / DirectWrite GlyphProps API might be
harder.  I haven't fully checked the DirectWrite API. If I split hb_shape()
and write ScriptShapeOpenType / ScriptPlaceOpenType around them, would that
be enough to get you going? Might be harder with ScriptShape / ScriptPlace
which have less slots to carry info, but then again they don't have
OpenType features, so less data needs to be channeled through as well.  It
might be doable after all.


> P.S. Behdad, how do you test things? Do you have large set of texts +
> fonts you run against, more than what's in /test of hb tree I mean.
> Since hb-shape can also use Uniscribe or DirectWrite, that would be
> helpful to have such data to test Wine on.
>

Check out my writeup and talk:
https://goo.gl/9eWCLy
https://www.youtube.com/watch?v=sMkO4gF4-3U

The input data is at:
https://github.com/harfbuzz/harfbuzz-testing-wikipedia

I have a few local scripts that run this and diff against pre-recorded
output of Uniscribe, for a set of fonts. Mine is just default MS font for
each Indic scripts. That's what the numbers we put in the commits are about:

    BENGALI: 353725 out of 354188 tests passed. 463 failed (0.130722%)
    DEVANAGARI: 707307 out of 707394 tests passed. 87 failed (0.0122987%)
    GUJARATI: 366355 out of 366457 tests passed. 102 failed (0.0278341%)
    GURMUKHI: 60729 out of 60747 tests passed. 18 failed (0.0296311%)
    KANNADA: 951300 out of 951913 tests passed. 613 failed (0.0643966%)
    KHMER: 299071 out of 299124 tests passed. 53 failed (0.0177184%)
    MALAYALAM: 1048136 out of 1048334 tests passed. 198 failed (0.0188871%)
    ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%)
    SINHALA: 271662 out of 271847 tests passed. 185 failed (0.068053%)
    TAMIL: 1091754 out of 1091754 tests passed. 0 failed (0%)
    TELUGU: 970555 out of 970573 tests passed. 18 failed (0.00185457%)

I should make it possible for others to reproduce these.

Jonathan Kew also has had built a portal running on Amazon AWS, comparing
Uniscribe and HarfBuzz outputs on the fly and generating browsable
dashboard of the diffs. It wasn't fully productionized. It's worth picking
up again.

The main problem is that the output generated from these test suites is
massive. Just storing it is takes a lot of resources. So it's most feasible
to run the two backends side-by-side and only print out the diffs.

-- 
behdad
http://behdad.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/harfbuzz/attachments/20180307/195c7bf7/attachment.html>


More information about the HarfBuzz mailing list