[HarfBuzz] how to detect missing glyphs e.g. for font substitition
Sebastien Metrot
meeloo at meeloo.net
Mon May 11 02:52:36 PDT 2015
Hi Louis,
In my font engine I start by doing font selection depending on the presence of glyphs and encoding before call harfbuz to shape the string. The process is tedious but simple: break the text into text runs by trying to find changes in the properties of the text stream:
1. Split text into paragraphs (LayoutText)
2. Split text into fonts
3. Split paragraphs into ranges (LayoutParagraph)
4. Split ranges into possible fonts (I try to keep the number of fonts to a minimum)
5. Split ranges into lines / words if needed
Then I shape each run with harfbuz. Each run can have a different font.
I’m not saying it’s the perfect solution to the problem but it worked fine for me and for now I don’t think I have encountered cases where harfbuz was missing a glyph in the end. I think that having a "missing glyph” callback would not work for me as it would already be too late and that I would have to restart the text layout and font selection from the beginning.
S.
> On 11 May 2015, at 09:56, Louis Semprini <lsemprini at hotmail.com> wrote:
>
> What is the most reliable and non-font-dependent way to detect whether a string being shaped by hb_shape() has led to any missing glyphs, and to identify where those glyphs occur?
>
> When I use the word "missing glyph" here I mean a glyph that is not what the user intended for that code point in that context, whether that be a little tofu box, a magical hex box, a space glyph (with or without zero advance), a diamond, or anything else that has substituted for the glyph that the user really wanted.
>
> In particular, after calling hb_shape(),
>
> - can we be guaranteed that a hb_glyph_info_t.codepoint (which is actually a glyph index despite the name) of 0 always means "missing glyph" ?
>
> - can we be guaranteed that hb_glyph_info_t.codepoint==0 is the only possible value that means "missing glyph" and that no glyph index values OTHER THAN 0 also mean "missing glyph"?
>
> If not, is there a better way to detect missing glyphs using the output of hb_shape(), or some other Harfbuzz call?
>
> If the answer is "yes, except the following cases used with the following shapers," that might still be useful, so please elaborate.
>
> Or, must Harfbuzz callers first do a complete, separate pass where they run all code points of the input through some kind of mapping routine that uses the fonts' 'cmap' and other tables? The latter would be a shame because it would require the Harfbuzz caller to duplicate a vast amount of the complexity that is nicely hidden in Harfbuzz in their own code. It's also a shame because in most cases, no font substitution would be needed and so it would be inefficient in the average case.
>
> As to the question of what a Harfbuzz caller would/could do after knowing that a missing glyph existed, in order to fix the problem, that totally depends on the particular application for which Harfbuzz is being used. If the set of possible input code points and the set of possible fonts used to render them were totally unconstrained, of course that requires a full general-purpose font substitution scheme like that built into major OSes and is a massive project that may well depend on a deep knowledge of OpenType tables such as the unicode range flags in the 'OS/2' table and others. But there are plenty of other useful cases where a Harfbuzz caller could make use of the 'missing glyph' information to institute a quick and effective solution. For example, any app which displays data whose total set of code points is known (either static content or dynamic content where the set of code points that need to be supported is limited by the market where the app is sold) can reliably choose (at code authoring time) a particular fallback font to use if the user's choice of font leads to missing glyphs.
>
> For such situations it would be nice to hang that font substitution decision off of a "there were missing glyphs" result from hb_shape() since it would be by far the rare case, and the common case of OK glyphs would therefore be faster. So that's why I am asking if there is any such way.
>
> Thanks again all for your helpful answers in this forum.
>
> _______________________________________________
> HarfBuzz mailing list
> HarfBuzz at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/harfbuzz
More information about the HarfBuzz
mailing list