<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">I want to build a rendering system
where U+0041 renders as an "A", regardless of the selected font.<br>
<br>
Eric.<br>
<br>
<br>
On 1/17/18 3:48 PM, Behdad Esfahbod wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CAF63+7WsDT-8rRCq4JA=9knGWhZpLhK1i0V1jiNTsBOd0jEPpA@mail.gmail.com">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<div dir="ltr">What's the actual problem you are facing?<br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Mon, Jan 15, 2018 at 9:58 AM, Eric
Muller <span dir="ltr"><<a
href="mailto:emuller@amazon.com" target="_blank"
moz-do-not-send="true">emuller@amazon.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div text="#000000" bgcolor="#FFFFFF">
<div class="m_-4940559382455268948moz-cite-prefix"><span
class=""><br>
<blockquote type="cite">It's clear that if the symbol
font is asked by name, we should do the shift.</blockquote>
</span> I think I disagree, in the sense that HB should
not impose that behavior on it's clients. HB is clearly
the right place to implement the behavior, but the
choice of having that behavior or not should be with the
client.<br>
<br>
For any document format, rendering the moral equivalent
of <p font-family='symbol'>A<<wbr>/p>
with something else that an "A" implies that all ASCII
is PUA. That's a choice Word, InDesign, Notepad may make
if they want, but it should not be imposed on all users
of HB. <br>
<br>
Personally, I think it is a very bad choice for HTML,
and Firefox seems to agree. It seems nice and user
friendly at first, but this makes the document
ambiguous. What about <p font-family='minion,
symbol'>A</p>? It's an A or not an A
depending on the presence of "minion" in the client.
What does the document mean?<br>
<br>
Of course, <p
font-family='symbol'><<wbr>/p>
should render with the glyph symbol.cmap(F041). So even
if the shift is never done, the glyph is usable. It's
just that you don't have the convenience of an IME-like
mechanism provided by the shaping engine, but you gain a
reliable semantic for the text.<br>
<br>
<blockquote type="cite">That's good behavior [in Word],
but beyond what HarfBuzz can do.</blockquote>
Yes, which is why the shift may be acceptable or even
desirable for some clients, and so hopefully the client
could choose.<span class=""><br>
<br>
<blockquote type="cite">What would clients do with
that control then? How would they set it?</blockquote>
</span> If I build an app that is meant to work like
other GDI apps, I allow the shift (and may be add
mitigating measures like Word). If I build an app such
as Firefox, I don't allow it. The choice is entirely
driven by the type application I want to build, and how
I want to define my document format.<br>
<br>
<br>
If you were to implement this choice, I can see it
either in the construction of the HB unicode functions,
or in the hb_buffer (either globally, or one a character
by character basis). I have a preference for the latter:
this choice could be passed down to the cmap lookup
functions, HB or not; it could also be different on
different parts of a document, may be reacting to
markup.<span class="HOEnZb"><font color="#888888"><br>
<br>
Eric.</font></span>
<div>
<div class="h5"><br>
<br>
<br>
On 1/15/18 6:46 AM, Behdad Esfahbod wrote:<br>
</div>
</div>
</div>
<div>
<div class="h5">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">Hi Eric,<br>
<br>
</div>
<div class="gmail_extra">
<div class="gmail_quote">On Mon, Jan 15, 2018 at
2:25 AM, Eric Muller <span dir="ltr"><<a
href="mailto:emuller@amazon.com"
target="_blank" moz-do-not-send="true">emuller@amazon.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">It seems that
with a font that has only a 3, 0 cmap
subtable (and may be some macintosh
subtables), then HB will automatically do
the shift by F000 (in the function
get_glyph_from_symbol) for code points below
U+00FF that are not mapped by the subtable.<br>
</blockquote>
<div><br>
</div>
<div>Right. Only in hb-ot-func though. Client
font funcs can do otherwise.<br>
<br>
</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> It is clear
that when U+0041 A is set with a symbol
font, then that U+0041 has actually the
semantics of a PUA code point, and certainly
should not be treated as an "A". That's the
whole point of a 3,0 cmap subtable.<br>
</blockquote>
<div><br>
</div>
<div>Correct.<br>
<br>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> Consider an
HTML page. The font-family is only a request
and there is no guarantee that the actual
font will or will not be a symbol font. Thus
the semantic of the HTML page can change
depending on the browser environment.
Outside a browser, it seems that the safe
treatment is therefore to consider all code
points below U+00FF as PUA, which is clearly
not tenable. So in that environment, I think
that the shift should not be done. Of
course, U+F041 should work.<br>
</blockquote>
<div><br>
</div>
<div>My take on this is that it's a bug of the
font fallback logic if it falls back to a
symbol font. I changed fontconfig to never
do that.<br>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> Note that
behavior of Word 2016 on Windows is actually
more elaborate: enter U+0041, and set it
with a non-symbol font; copy/paste or save
to a text file, and the result is U+0041;
but set this A in a symbol font, and
copy/paste or save to a text file, and the
result is U+F041.<br>
</blockquote>
<div><br>
</div>
<div>That's good behavior, but beyond what
HarfBuzz can do.<br>
<br>
</div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> I think that
the shift should be controllable by the
client, rather than systematically applied.
I don't have a strong opinion about the
default behavior (i.e. when HB's client does
not specify whether the shift should be done
or not).<br>
</blockquote>
<div><br>
</div>
<div>What would clients do with that control
then? How would they set it?<br>
<br>
</div>
<div>I implemented this shift in fontconfig
and then harfbuzz because in LibreOffice and
other software, there were existing
documents that referred to windings or other
symbol fonts and encoding characters in the
ASCII range. It's clear that if the symbol
font is asked by name, we should do the
shift. If it's NOT, then it should not be
chosen to render text to begin with, which
means the shift can be applied
unconditionally.<br>
<br>
</div>
<div>How does that sound?<br>
</div>
<div>behdad<br>
</div>
<div> </div>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> Thoughts?<br>
<br>
Thanks,<br>
Eric.<br>
</blockquote>
<div> </div>
</div>
-- <br>
<div
class="m_-4940559382455268948gmail_signature"
data-smartmail="gmail_signature">behdad<br>
<a href="http://behdad.org/" target="_blank"
moz-do-not-send="true">http://behdad.org/</a></div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
<br clear="all">
<br>
-- <br>
<div class="gmail_signature" data-smartmail="gmail_signature">behdad<br>
<a href="http://behdad.org/" target="_blank"
moz-do-not-send="true">http://behdad.org/</a></div>
</div>
</blockquote>
<br>
</body>
</html>