[HarfBuzz] MS/Symbol cmap subtables

Eric Muller emuller at amazon.com
Fri Jan 19 07:27:34 UTC 2018

I want to build a rendering system where U+0041 renders as an "A", 
regardless of the selected font.


On 1/17/18 3:48 PM, Behdad Esfahbod wrote:
> What's the actual problem you are facing?
> On Mon, Jan 15, 2018 at 9:58 AM, Eric Muller <emuller at amazon.com 
> <mailto:emuller at amazon.com>> wrote:
>>     It's clear that if the symbol font is asked by name, we should do
>>     the shift.
>     I think I disagree, in the sense that HB should not impose that
>     behavior on it's clients. HB is clearly the right place to
>     implement the behavior, but the choice of having that behavior or
>     not should be with the client.
>     For any document format, rendering the moral equivalent of <p
>     font-family='symbol'>&#x0041;</p> with something else that an "A"
>     implies that all ASCII is PUA. That's a choice Word, InDesign,
>     Notepad may make if they want, but it should not be imposed on all
>     users of HB.
>     Personally, I think it is a very bad choice for HTML, and Firefox
>     seems to agree. It seems nice and user friendly at first, but this
>     makes the document ambiguous. What about <p font-family='minion,
>     symbol'>&#x0041;</p>? It's an A or not an A depending on the
>     presence of "minion" in the client. What does the document mean?
>     Of course, <p font-family='symbol'>&#xF041;</p> should render with
>     the glyph symbol.cmap(F041). So even if the shift is never done,
>     the glyph is usable. It's just that you don't have the convenience
>     of an IME-like mechanism provided by the shaping engine, but you
>     gain a reliable semantic for the text.
>>     That's good behavior [in Word], but beyond what HarfBuzz can do.
>     Yes, which is why the shift may be acceptable or even desirable
>     for some clients, and so hopefully the client could choose.
>>     What would clients do with that control then? How would they set it?
>     If I build an app that is meant to work like other GDI apps, I
>     allow the shift (and may be add mitigating measures like Word). If
>     I build an app such as Firefox, I don't allow it. The choice is
>     entirely driven by the type application I want to build, and how I
>     want to define my document format.
>     If you were to implement this choice, I can see it either in the
>     construction of the HB unicode functions, or in the hb_buffer
>     (either globally, or one a character by character basis). I have a
>     preference for the latter: this choice could be passed down to the
>     cmap lookup functions, HB or not; it could also be different on
>     different parts of a document, may be reacting to markup.
>     Eric.
>     On 1/15/18 6:46 AM, Behdad Esfahbod wrote:
>>     Hi Eric,
>>     On Mon, Jan 15, 2018 at 2:25 AM, Eric Muller <emuller at amazon.com
>>     <mailto:emuller at amazon.com>> wrote:
>>         It seems that with a font that has only a 3, 0 cmap subtable
>>         (and may be some macintosh subtables), then HB will
>>         automatically do the shift by F000 (in the function
>>         get_glyph_from_symbol) for code points below U+00FF that are
>>         not mapped by the subtable.
>>     Right. Only in hb-ot-func though. Client font funcs can do otherwise.
>>         It is clear that when U+0041 A is set with a symbol font,
>>         then that U+0041 has actually the semantics of a PUA code
>>         point, and certainly should not be treated as an "A". That's
>>         the whole point of a 3,0 cmap subtable.
>>     Correct.
>>         Consider an HTML page. The font-family is only a request and
>>         there is no guarantee that the actual font will or will not
>>         be a symbol font. Thus the semantic of the HTML page can
>>         change depending on the browser environment. Outside a
>>         browser, it seems that the safe treatment is therefore to
>>         consider all code points below U+00FF as PUA, which is
>>         clearly not tenable. So in that environment, I think that the
>>         shift should not be done. Of course, U+F041 should work.
>>     My take on this is that it's a bug of the font fallback logic if
>>     it falls back to a symbol font.  I changed fontconfig to never do
>>     that.
>>         Note that behavior of Word 2016 on Windows is actually more
>>         elaborate: enter U+0041, and set it with a non-symbol font;
>>         copy/paste or save to a text file, and the result is U+0041;
>>         but set this A in a symbol font, and copy/paste or save to a
>>         text file, and the result is U+F041.
>>     That's good behavior, but beyond what HarfBuzz can do.
>>         I think that the shift should be controllable by the client,
>>         rather than systematically applied. I don't have a strong
>>         opinion about the default behavior (i.e. when HB's client
>>         does not specify whether the shift should be done or not).
>>     What would clients do with that control then? How would they set it?
>>     I implemented this shift in fontconfig and then harfbuzz because
>>     in LibreOffice and other software, there were existing documents
>>     that referred to windings or other symbol fonts and encoding
>>     characters in the ASCII range. It's clear that if the symbol font
>>     is asked by name, we should do the shift. If it's NOT, then it
>>     should not be chosen to render text to begin with, which means
>>     the shift can be applied unconditionally.
>>     How does that sound?
>>     behdad
>>         Thoughts?
>>         Thanks,
>>         Eric.
>>     -- 
>>     behdad
>>     http://behdad.org/
> -- 
> behdad
> http://behdad.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/harfbuzz/attachments/20180118/5670c301/attachment.html>

More information about the HarfBuzz mailing list