[HarfBuzz] MS/Symbol cmap subtables

Eric Muller emuller at amazon.com
Sun Jan 21 02:22:23 UTC 2018


> The easiest would be to add a new API analogous to 
> hb_ot_font_set_funcs(), that does NOT have the symbol shift in it
That works.

Thanks,
Eric.


On 1/19/18 4:43 PM, Behdad Esfahbod wrote:
> Ok, let's see how we can address this...
>
> I don't like a setting on the buffer as currently the get_glyph() 
> callback has no way of accessing that information.  The easiest would 
> be to add a new API analogous to hb_ot_font_set_funcs(), that does NOT 
> have the symbol shift in it.  It's not the most elegant solution but 
> easiest.  Would that work for you?
>
> That said, this issue is also related, as it pertains another 
> non-Unicode encoding, though, in the font not the buffer:
>
> https://github.com/harfbuzz/harfbuzz/issues/681
>
> On Thu, Jan 18, 2018 at 11:27 PM, Eric Muller <emuller at amazon.com 
> <mailto:emuller at amazon.com>> wrote:
>
>     I want to build a rendering system where U+0041 renders as an "A",
>     regardless of the selected font.
>
>     Eric.
>
>
>
>     On 1/17/18 3:48 PM, Behdad Esfahbod wrote:
>>     What's the actual problem you are facing?
>>
>>     On Mon, Jan 15, 2018 at 9:58 AM, Eric Muller <emuller at amazon.com
>>     <mailto:emuller at amazon.com>> wrote:
>>
>>
>>>         It's clear that if the symbol font is asked by name, we
>>>         should do the shift.
>>         I think I disagree, in the sense that HB should not impose
>>         that behavior on it's clients. HB is clearly the right place
>>         to implement the behavior, but the choice of having that
>>         behavior or not should be with the client.
>>
>>         For any document format, rendering the moral equivalent of <p
>>         font-family='symbol'>&#x0041;</p> with something else that an
>>         "A" implies that all ASCII is PUA. That's a choice Word,
>>         InDesign, Notepad may make if they want, but it should not be
>>         imposed on all users of HB.
>>
>>         Personally, I think it is a very bad choice for HTML, and
>>         Firefox seems to agree. It seems nice and user friendly at
>>         first, but this makes the document ambiguous. What about <p
>>         font-family='minion, symbol'>&#x0041;</p>? It's an A or not
>>         an A depending on the presence of "minion" in the client.
>>         What does the document mean?
>>
>>         Of course, <p font-family='symbol'>&#xF041;</p> should render
>>         with the glyph symbol.cmap(F041). So even if the shift is
>>         never done, the glyph is usable. It's just that you don't
>>         have the convenience of an IME-like mechanism provided by the
>>         shaping engine, but you gain a reliable semantic for the text.
>>
>>>         That's good behavior [in Word], but beyond what HarfBuzz can do.
>>         Yes, which is why the shift may be acceptable or even
>>         desirable for some clients, and so hopefully the client could
>>         choose.
>>
>>>         What would clients do with that control then? How would they
>>>         set it?
>>         If I build an app that is meant to work like other GDI apps,
>>         I allow the shift (and may be add mitigating measures like
>>         Word). If I build an app such as Firefox, I don't allow it.
>>         The choice is entirely driven by the type application I want
>>         to build, and how I want to define my document format.
>>
>>
>>         If you were to implement this choice, I can see it either in
>>         the construction of the HB unicode functions, or in the
>>         hb_buffer (either globally, or one a character by character
>>         basis). I have a preference for the latter: this choice could
>>         be passed down to the cmap lookup functions, HB or not; it
>>         could also be different on different parts of a document, may
>>         be reacting to markup.
>>
>>         Eric.
>>
>>
>>
>>         On 1/15/18 6:46 AM, Behdad Esfahbod wrote:
>>>         Hi Eric,
>>>
>>>         On Mon, Jan 15, 2018 at 2:25 AM, Eric Muller
>>>         <emuller at amazon.com <mailto:emuller at amazon.com>> wrote:
>>>
>>>             It seems that with a font that has only a 3, 0 cmap
>>>             subtable (and may be some macintosh subtables), then HB
>>>             will automatically do the shift by F000 (in the function
>>>             get_glyph_from_symbol) for code points below U+00FF that
>>>             are not mapped by the subtable.
>>>
>>>
>>>         Right. Only in hb-ot-func though. Client font funcs can do
>>>         otherwise.
>>>
>>>             It is clear that when U+0041 A is set with a symbol
>>>             font, then that U+0041 has actually the semantics of a
>>>             PUA code point, and certainly should not be treated as
>>>             an "A". That's the whole point of a 3,0 cmap subtable.
>>>
>>>
>>>         Correct.
>>>
>>>             Consider an HTML page. The font-family is only a request
>>>             and there is no guarantee that the actual font will or
>>>             will not be a symbol font. Thus the semantic of the HTML
>>>             page can change depending on the browser environment.
>>>             Outside a browser, it seems that the safe treatment is
>>>             therefore to consider all code points below U+00FF as
>>>             PUA, which is clearly not tenable. So in that
>>>             environment, I think that the shift should not be done.
>>>             Of course, U+F041 should work.
>>>
>>>
>>>         My take on this is that it's a bug of the font fallback
>>>         logic if it falls back to a symbol font.  I changed
>>>         fontconfig to never do that.
>>>
>>>             Note that behavior of Word 2016 on Windows is actually
>>>             more elaborate: enter U+0041, and set it with a
>>>             non-symbol font; copy/paste or save to a text file, and
>>>             the result is U+0041; but set this A in a symbol font,
>>>             and copy/paste or save to a text file, and the result is
>>>             U+F041.
>>>
>>>
>>>         That's good behavior, but beyond what HarfBuzz can do.
>>>
>>>             I think that the shift should be controllable by the
>>>             client, rather than systematically applied. I don't have
>>>             a strong opinion about the default behavior (i.e. when
>>>             HB's client does not specify whether the shift should be
>>>             done or not).
>>>
>>>
>>>         What would clients do with that control then? How would they
>>>         set it?
>>>
>>>         I implemented this shift in fontconfig and then harfbuzz
>>>         because in LibreOffice and other software, there were
>>>         existing documents that referred to windings or other symbol
>>>         fonts and encoding characters in the ASCII range. It's clear
>>>         that if the symbol font is asked by name, we should do the
>>>         shift. If it's NOT, then it should not be chosen to render
>>>         text to begin with, which means the shift can be applied
>>>         unconditionally.
>>>
>>>         How does that sound?
>>>         behdad
>>>
>>>             Thoughts?
>>>
>>>             Thanks,
>>>             Eric.
>>>
>>>         -- 
>>>         behdad
>>>         http://behdad.org/
>>
>>
>>
>>
>>     -- 
>>     behdad
>>     http://behdad.org/
>
>
>
>
> -- 
> behdad
> http://behdad.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/harfbuzz/attachments/20180120/cc702c64/attachment-0001.html>


More information about the HarfBuzz mailing list