[HarfBuzz] how to detect missing glyphs e.g. for font substitition

Mon May 11 00:56:19 PDT 2015

What is the most reliable and non-font-dependent way to detect whether a string being shaped by hb_shape() has led to any missing glyphs, and to identify where those glyphs occur?

When I use the word "missing glyph" here I mean a glyph that is not what the user intended for that code point in that context, whether that be a little tofu box, a magical hex box, a space glyph (with or without zero advance), a diamond, or anything else that has substituted for the glyph that the user really wanted.

In particular, after calling hb_shape(),

- can we be guaranteed that a hb_glyph_info_t.codepoint (which is actually a glyph index despite the name) of 0 always means "missing glyph" ?

- can we be guaranteed that hb_glyph_info_t.codepoint==0 is the only possible value that means "missing glyph" and that no glyph index values OTHER THAN 0 also mean "missing glyph"?

If not, is there a better way to detect missing glyphs using the output of hb_shape(), or some other Harfbuzz call?

If the answer is "yes, except the following cases used with the 
following shapers," that might still be useful, so please elaborate.

Or, must Harfbuzz callers first do a complete, separate pass where they run all code points of the input through some kind of mapping routine that uses the fonts' 'cmap' and other tables?  The latter would be a shame because it would require the Harfbuzz caller to duplicate a vast amount of the complexity that is nicely hidden in Harfbuzz in their own code.  It's also a shame because in most cases, no font substitution would be needed and so it would be inefficient in the average case.

As to the question of what a Harfbuzz caller would/could do after knowing that a missing glyph existed, in order to fix the problem, that totally depends on the particular application for which Harfbuzz is being used.  If the set of possible input code points and the set of possible fonts used to render them were totally unconstrained, of course that requires a full general-purpose font substitution scheme like that built into major OSes and is a massive project that may well depend on a deep knowledge of OpenType tables such as the unicode range flags in the 'OS/2' table and others.  But there are plenty of other useful cases where a Harfbuzz caller could make use of the 'missing glyph' information to institute a quick and effective solution.  For example, any app which displays data whose total set of code points is known (either static content or dynamic content where the set of code points that need to be supported is limited by the market where the app is sold) can reliably choose (at code authoring time) a particular fallback font to use if the user's choice of font leads to missing glyphs.  

For such situations it would be nice to hang that font substitution decision off of a "there were missing glyphs" result from hb_shape() since it would be by far the rare case, and the common case of OK glyphs would therefore be faster.  So that's why I am asking if there is any such way.

Thanks again all for your helpful answers in this forum.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/harfbuzz/attachments/20150511/115abc84/attachment.html>