[HarfBuzz] MS/Symbol cmap subtables

Behdad Esfahbod behdad at behdad.org
Sat Jan 20 00:43:40 UTC 2018


Ok, let's see how we can address this...

I don't like a setting on the buffer as currently the get_glyph() callback
has no way of accessing that information.  The easiest would be to add a
new API analogous to hb_ot_font_set_funcs(), that does NOT have the symbol
shift in it.  It's not the most elegant solution but easiest.  Would that
work for you?

That said, this issue is also related, as it pertains another non-Unicode
encoding, though, in the font not the buffer:

  https://github.com/harfbuzz/harfbuzz/issues/681

On Thu, Jan 18, 2018 at 11:27 PM, Eric Muller <emuller at amazon.com> wrote:

> I want to build a rendering system where U+0041 renders as an "A",
> regardless of the selected font.
>
> Eric.
>
>
>
> On 1/17/18 3:48 PM, Behdad Esfahbod wrote:
>
> What's the actual problem you are facing?
>
> On Mon, Jan 15, 2018 at 9:58 AM, Eric Muller <emuller at amazon.com> wrote:
>
>>
>> It's clear that if the symbol font is asked by name, we should do the
>> shift.
>>
>> I think I disagree, in the sense that HB should not impose that behavior
>> on it's clients. HB is clearly the right place to implement the behavior,
>> but the choice of having that behavior or not should be with the client.
>>
>> For any document format, rendering the moral equivalent of <p
>> font-family='symbol'>&#x0041;</p> with something else that an "A"
>> implies that all ASCII is PUA. That's a choice Word, InDesign, Notepad may
>> make if they want, but it should not be imposed on all users of HB.
>>
>> Personally, I think it is a very bad choice for HTML, and Firefox seems
>> to agree. It seems nice and user friendly at first, but this makes the
>> document ambiguous. What about <p font-family='minion,
>> symbol'>&#x0041;</p>? It's an A or not an A depending on the presence of
>> "minion" in the client. What does the document mean?
>>
>> Of course, <p font-family='symbol'>&#xF041;</p> should render with the
>> glyph symbol.cmap(F041). So even if the shift is never done, the glyph is
>> usable. It's just that you don't have the convenience of an IME-like
>> mechanism provided by the shaping engine, but you gain a reliable semantic
>> for the text.
>>
>> That's good behavior [in Word], but beyond what HarfBuzz can do.
>>
>> Yes, which is why the shift may be acceptable or even desirable for some
>> clients, and so hopefully the client could choose.
>>
>> What would clients do with that control then? How would they set it?
>>
>> If I build an app that is meant to work like other GDI apps, I allow the
>> shift (and may be add mitigating measures like Word). If I build an app
>> such as Firefox, I don't allow it. The choice is entirely driven by the
>> type application I want to build, and how I want to define my document
>> format.
>>
>>
>> If you were to implement this choice, I can see it either in the
>> construction of the HB unicode functions, or in the hb_buffer (either
>> globally, or one a character by character basis). I have a preference for
>> the latter: this choice could be passed down to the cmap lookup functions,
>> HB or not; it could also be different on different parts of a document, may
>> be reacting to markup.
>>
>> Eric.
>>
>>
>>
>> On 1/15/18 6:46 AM, Behdad Esfahbod wrote:
>>
>> Hi Eric,
>>
>> On Mon, Jan 15, 2018 at 2:25 AM, Eric Muller <emuller at amazon.com> wrote:
>>
>>> It seems that with a font that has only a 3, 0 cmap subtable (and may be
>>> some macintosh subtables), then HB will automatically do the shift by F000
>>> (in the function get_glyph_from_symbol) for code points below U+00FF that
>>> are not mapped by the subtable.
>>>
>>
>> Right. Only in hb-ot-func though. Client font funcs can do otherwise.
>>
>>
>>
>>> It is clear that when U+0041 A is set with a symbol font, then that
>>> U+0041 has actually the semantics of a PUA code point, and certainly should
>>> not be treated as an "A". That's the whole point of a 3,0 cmap subtable.
>>>
>>
>> Correct.
>>
>>
>>> Consider an HTML page. The font-family is only a request and there is no
>>> guarantee that the actual font will or will not be a symbol font. Thus the
>>> semantic of the HTML page can change depending on the browser environment.
>>> Outside a browser, it seems that the safe treatment is therefore to
>>> consider all code points below U+00FF as PUA, which is clearly not tenable.
>>> So in that environment, I think that the shift should not be done. Of
>>> course, U+F041 should work.
>>>
>>
>> My take on this is that it's a bug of the font fallback logic if it falls
>> back to a symbol font.  I changed fontconfig to never do that.
>>
>>
>>> Note that behavior of Word 2016 on Windows is actually more elaborate:
>>> enter U+0041, and set it with a non-symbol font; copy/paste or save to a
>>> text file, and the result is U+0041; but set this A in a symbol font, and
>>> copy/paste or save to a text file, and the result is U+F041.
>>>
>>
>> That's good behavior, but beyond what HarfBuzz can do.
>>
>>
>>> I think that the shift should be controllable by the client, rather than
>>> systematically applied. I don't have a strong opinion about the default
>>> behavior (i.e. when HB's client does not specify whether the shift should
>>> be done or not).
>>>
>>
>> What would clients do with that control then? How would they set it?
>>
>> I implemented this shift in fontconfig and then harfbuzz because in
>> LibreOffice and other software, there were existing documents that referred
>> to windings or other symbol fonts and encoding characters in the ASCII
>> range. It's clear that if the symbol font is asked by name, we should do
>> the shift. If it's NOT, then it should not be chosen to render text to
>> begin with, which means the shift can be applied unconditionally.
>>
>> How does that sound?
>> behdad
>>
>>
>>> Thoughts?
>>>
>>> Thanks,
>>> Eric.
>>>
>>
>> --
>> behdad
>> http://behdad.org/
>>
>>
>>
>
>
> --
> behdad
> http://behdad.org/
>
>
>


-- 
behdad
http://behdad.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/harfbuzz/attachments/20180119/ec71fcc3/attachment.html>


More information about the HarfBuzz mailing list