[HarfBuzz] MS/Symbol cmap subtables

Behdad Esfahbod behdad at behdad.org
Sat Apr 28 01:38:54 UTC 2018


Sorry, no progress so far. But for tracking purposes:
https://github.com/harfbuzz/harfbuzz/issues/1011

On Sat, Jan 20, 2018 at 6:22 PM, Eric Muller <emuller at amazon.com> wrote:

> The easiest would be to add a new API analogous to hb_ot_font_set_funcs(),
> that does NOT have the symbol shift in it
>
> That works.
>
> Thanks,
> Eric.
>
>
>
> On 1/19/18 4:43 PM, Behdad Esfahbod wrote:
>
> Ok, let's see how we can address this...
>
> I don't like a setting on the buffer as currently the get_glyph() callback
> has no way of accessing that information.  The easiest would be to add a
> new API analogous to hb_ot_font_set_funcs(), that does NOT have the symbol
> shift in it.  It's not the most elegant solution but easiest.  Would that
> work for you?
>
> That said, this issue is also related, as it pertains another non-Unicode
> encoding, though, in the font not the buffer:
>
>   https://github.com/harfbuzz/harfbuzz/issues/681
>
> On Thu, Jan 18, 2018 at 11:27 PM, Eric Muller <emuller at amazon.com> wrote:
>
>> I want to build a rendering system where U+0041 renders as an "A",
>> regardless of the selected font.
>>
>> Eric.
>>
>>
>>
>> On 1/17/18 3:48 PM, Behdad Esfahbod wrote:
>>
>> What's the actual problem you are facing?
>>
>> On Mon, Jan 15, 2018 at 9:58 AM, Eric Muller <emuller at amazon.com> wrote:
>>
>>>
>>> It's clear that if the symbol font is asked by name, we should do the
>>> shift.
>>>
>>> I think I disagree, in the sense that HB should not impose that behavior
>>> on it's clients. HB is clearly the right place to implement the behavior,
>>> but the choice of having that behavior or not should be with the client.
>>>
>>> For any document format, rendering the moral equivalent of <p
>>> font-family='symbol'>&#x0041;</p> with something else that an "A"
>>> implies that all ASCII is PUA. That's a choice Word, InDesign, Notepad may
>>> make if they want, but it should not be imposed on all users of HB.
>>>
>>> Personally, I think it is a very bad choice for HTML, and Firefox seems
>>> to agree. It seems nice and user friendly at first, but this makes the
>>> document ambiguous. What about <p font-family='minion,
>>> symbol'>&#x0041;</p>? It's an A or not an A depending on the presence of
>>> "minion" in the client. What does the document mean?
>>>
>>> Of course, <p font-family='symbol'>&#xF041;</p> should render with the
>>> glyph symbol.cmap(F041). So even if the shift is never done, the glyph is
>>> usable. It's just that you don't have the convenience of an IME-like
>>> mechanism provided by the shaping engine, but you gain a reliable semantic
>>> for the text.
>>>
>>> That's good behavior [in Word], but beyond what HarfBuzz can do.
>>>
>>> Yes, which is why the shift may be acceptable or even desirable for some
>>> clients, and so hopefully the client could choose.
>>>
>>> What would clients do with that control then? How would they set it?
>>>
>>> If I build an app that is meant to work like other GDI apps, I allow the
>>> shift (and may be add mitigating measures like Word). If I build an app
>>> such as Firefox, I don't allow it. The choice is entirely driven by the
>>> type application I want to build, and how I want to define my document
>>> format.
>>>
>>>
>>> If you were to implement this choice, I can see it either in the
>>> construction of the HB unicode functions, or in the hb_buffer (either
>>> globally, or one a character by character basis). I have a preference for
>>> the latter: this choice could be passed down to the cmap lookup functions,
>>> HB or not; it could also be different on different parts of a document, may
>>> be reacting to markup.
>>>
>>> Eric.
>>>
>>>
>>>
>>> On 1/15/18 6:46 AM, Behdad Esfahbod wrote:
>>>
>>> Hi Eric,
>>>
>>> On Mon, Jan 15, 2018 at 2:25 AM, Eric Muller <emuller at amazon.com> wrote:
>>>
>>>> It seems that with a font that has only a 3, 0 cmap subtable (and may
>>>> be some macintosh subtables), then HB will automatically do the shift by
>>>> F000 (in the function get_glyph_from_symbol) for code points below U+00FF
>>>> that are not mapped by the subtable.
>>>>
>>>
>>> Right. Only in hb-ot-func though. Client font funcs can do otherwise.
>>>
>>>
>>>
>>>> It is clear that when U+0041 A is set with a symbol font, then that
>>>> U+0041 has actually the semantics of a PUA code point, and certainly should
>>>> not be treated as an "A". That's the whole point of a 3,0 cmap subtable.
>>>>
>>>
>>> Correct.
>>>
>>>
>>>> Consider an HTML page. The font-family is only a request and there is
>>>> no guarantee that the actual font will or will not be a symbol font. Thus
>>>> the semantic of the HTML page can change depending on the browser
>>>> environment. Outside a browser, it seems that the safe treatment is
>>>> therefore to consider all code points below U+00FF as PUA, which is clearly
>>>> not tenable. So in that environment, I think that the shift should not be
>>>> done. Of course, U+F041 should work.
>>>>
>>>
>>> My take on this is that it's a bug of the font fallback logic if it
>>> falls back to a symbol font.  I changed fontconfig to never do that.
>>>
>>>
>>>> Note that behavior of Word 2016 on Windows is actually more elaborate:
>>>> enter U+0041, and set it with a non-symbol font; copy/paste or save to a
>>>> text file, and the result is U+0041; but set this A in a symbol font, and
>>>> copy/paste or save to a text file, and the result is U+F041.
>>>>
>>>
>>> That's good behavior, but beyond what HarfBuzz can do.
>>>
>>>
>>>> I think that the shift should be controllable by the client, rather
>>>> than systematically applied. I don't have a strong opinion about the
>>>> default behavior (i.e. when HB's client does not specify whether the shift
>>>> should be done or not).
>>>>
>>>
>>> What would clients do with that control then? How would they set it?
>>>
>>> I implemented this shift in fontconfig and then harfbuzz because in
>>> LibreOffice and other software, there were existing documents that referred
>>> to windings or other symbol fonts and encoding characters in the ASCII
>>> range. It's clear that if the symbol font is asked by name, we should do
>>> the shift. If it's NOT, then it should not be chosen to render text to
>>> begin with, which means the shift can be applied unconditionally.
>>>
>>> How does that sound?
>>> behdad
>>>
>>>
>>>> Thoughts?
>>>>
>>>> Thanks,
>>>> Eric.
>>>>
>>>
>>> --
>>> behdad
>>> http://behdad.org/
>>>
>>>
>>>
>>
>>
>> --
>> behdad
>> http://behdad.org/
>>
>>
>>
>
>
> --
> behdad
> http://behdad.org/
>
>
>


-- 
behdad
http://behdad.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/harfbuzz/attachments/20180427/0e97a37a/attachment-0001.html>


More information about the HarfBuzz mailing list