[HarfBuzz] MS/Symbol cmap subtables

Eric Muller emuller at amazon.com
Mon Jan 15 02:25:15 UTC 2018


It seems that with a font that has only a 3, 0 cmap subtable (and may be 
some macintosh subtables), then HB will automatically do the shift by 
F000 (in the function get_glyph_from_symbol) for code points below 
U+00FF that are not mapped by the subtable.

It is clear that when U+0041 A is set with a symbol font, then that 
U+0041 has actually the semantics of a PUA code point, and certainly 
should not be treated as an "A". That's the whole point of a 3,0 cmap 
subtable.

Consider an HTML page. The font-family is only a request and there is no 
guarantee that the actual font will or will not be a symbol font. Thus 
the semantic of the HTML page can change depending on the browser 
environment. Outside a browser, it seems that the safe treatment is 
therefore to consider all code points below U+00FF as PUA, which is 
clearly not tenable. So in that environment, I think that the shift 
should not be done. Of course, U+F041 should work.

Note that behavior of Word 2016 on Windows is actually more elaborate: 
enter U+0041, and set it with a non-symbol font; copy/paste or save to a 
text file, and the result is U+0041; but set this A in a symbol font, 
and copy/paste or save to a text file, and the result is U+F041.

I think that the shift should be controllable by the client, rather than 
systematically applied. I don't have a strong opinion about the default 
behavior (i.e. when HB's client does not specify whether the shift 
should be done or not).

Thoughts?

Thanks,
Eric.



More information about the HarfBuzz mailing list