<div dir="ltr"><div><div>Ok, let's see how we can address this...<br><br></div>I don't like a setting on the buffer as currently the get_glyph() callback has no way of accessing that information.  The easiest would be to add a new API analogous to hb_ot_font_set_funcs(), that does NOT have the symbol shift in it.  It's not the most elegant solution but easiest.  Would that work for you?<br><br></div>That said, this issue is also related, as it pertains another non-Unicode encoding, though, in the font not the buffer:<br><br>  <a href="https://github.com/harfbuzz/harfbuzz/issues/681">https://github.com/harfbuzz/harfbuzz/issues/681</a><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 18, 2018 at 11:27 PM, Eric Muller <span dir="ltr"><<a href="mailto:emuller@amazon.com" target="_blank">emuller@amazon.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    <div class="m_4762254172372698478moz-cite-prefix">I want to build a rendering system
      where U+0041 renders as an "A", regardless of the selected font.<span class="HOEnZb"><font color="#888888"><br>
      <br>
      Eric.</font></span><div><div class="h5"><br>
      <br>
      <br>
      On 1/17/18 3:48 PM, Behdad Esfahbod wrote:<br>
    </div></div></div><div><div class="h5">
    <blockquote type="cite">
      
      <div dir="ltr">What's the actual problem you are facing?<br>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Mon, Jan 15, 2018 at 9:58 AM, Eric
          Muller <span dir="ltr"><<a href="mailto:emuller@amazon.com" target="_blank">emuller@amazon.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <div class="m_4762254172372698478m_-4940559382455268948moz-cite-prefix"><span><br>
                  <blockquote type="cite">It's clear that if the symbol
                    font is asked by name, we should do the shift.</blockquote>
                </span> I think I disagree, in the sense that HB should
                not impose that behavior on it's clients. HB is clearly
                the right place to implement the behavior, but the
                choice of having that behavior or not should be with the
                client.<br>
                <br>
                For any document format, rendering the moral equivalent
                of <p font-family='symbol'>&#x0041;<<wbr>/p>
                with something else that an "A" implies that all ASCII
                is PUA. That's a choice Word, InDesign, Notepad may make
                if they want, but it should not be imposed on all users
                of HB. <br>
                <br>
                Personally, I think it is a very bad choice for HTML,
                and Firefox seems to agree. It seems nice and user
                friendly at first, but this makes the document
                ambiguous. What about <p font-family='minion,
                symbol'>&#x0041;</p>? It's an A or not an A
                depending on the presence of "minion" in the client.
                What does the document mean?<br>
                <br>
                Of course, <p
                font-family='symbol'>&#xF041;<<wbr>/p>
                should render with the glyph symbol.cmap(F041). So even
                if the shift is never done, the glyph is usable. It's
                just that you don't have the convenience of an IME-like
                mechanism provided by the shaping engine, but you gain a
                reliable semantic for the text.<br>
                <br>
                <blockquote type="cite">That's good behavior [in Word],
                  but beyond what HarfBuzz can do.</blockquote>
                Yes, which is why the shift may be acceptable or even
                desirable for some clients, and so hopefully the client
                could choose.<span><br>
                  <br>
                  <blockquote type="cite">What would clients do with
                    that control then? How would they set it?</blockquote>
                </span> If I build an app that is meant to work like
                other GDI apps, I allow the shift (and may be add
                mitigating measures like Word). If I build an app such
                as Firefox, I don't allow it. The choice is entirely
                driven by the type application I want to build, and how
                I want to define my document format.<br>
                <br>
                <br>
                If you were to implement this choice, I can see it
                either in the construction of the HB unicode functions,
                or in the hb_buffer (either globally, or one a character
                by character basis). I have a preference for the latter:
                this choice could be passed down to the cmap lookup
                functions, HB or not; it could also be different on
                different parts of a document, may be reacting to
                markup.<span class="m_4762254172372698478HOEnZb"><font color="#888888"><br>
                    <br>
                    Eric.</font></span>
                <div>
                  <div class="m_4762254172372698478h5"><br>
                    <br>
                    <br>
                    On 1/15/18 6:46 AM, Behdad Esfahbod wrote:<br>
                  </div>
                </div>
              </div>
              <div>
                <div class="m_4762254172372698478h5">
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">Hi Eric,<br>
                        <br>
                      </div>
                      <div class="gmail_extra">
                        <div class="gmail_quote">On Mon, Jan 15, 2018 at
                          2:25 AM, Eric Muller <span dir="ltr"><<a href="mailto:emuller@amazon.com" target="_blank">emuller@amazon.com</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">It seems that
                            with a font that has only a 3, 0 cmap
                            subtable (and may be some macintosh
                            subtables), then HB will automatically do
                            the shift by F000 (in the function
                            get_glyph_from_symbol) for code points below
                            U+00FF that are not mapped by the subtable.<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Right. Only in hb-ot-func though. Client
                            font funcs can do otherwise.<br>
                            <br>
                          </div>
                          <div> </div>
                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> It is clear
                            that when U+0041 A is set with a symbol
                            font, then that U+0041 has actually the
                            semantics of a PUA code point, and certainly
                            should not be treated as an "A". That's the
                            whole point of a 3,0 cmap subtable.<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>Correct.<br>
                             <br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Consider an
                            HTML page. The font-family is only a request
                            and there is no guarantee that the actual
                            font will or will not be a symbol font. Thus
                            the semantic of the HTML page can change
                            depending on the browser environment.
                            Outside a browser, it seems that the safe
                            treatment is therefore to consider all code
                            points below U+00FF as PUA, which is clearly
                            not tenable. So in that environment, I think
                            that the shift should not be done. Of
                            course, U+F041 should work.<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>My take on this is that it's a bug of the
                            font fallback logic if it falls back to a
                            symbol font.  I changed fontconfig to never
                            do that.<br>
                             </div>
                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Note that
                            behavior of Word 2016 on Windows is actually
                            more elaborate: enter U+0041, and set it
                            with a non-symbol font; copy/paste or save
                            to a text file, and the result is U+0041;
                            but set this A in a symbol font, and
                            copy/paste or save to a text file, and the
                            result is U+F041.<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>That's good behavior, but beyond what
                            HarfBuzz can do.<br>
                             <br>
                          </div>
                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I think that
                            the shift should be controllable by the
                            client, rather than systematically applied.
                            I don't have a strong opinion about the
                            default behavior (i.e. when HB's client does
                            not specify whether the shift should be done
                            or not).<br>
                          </blockquote>
                          <div><br>
                          </div>
                          <div>What would clients do with that control
                            then? How would they set it?<br>
                            <br>
                          </div>
                          <div>I implemented this shift in fontconfig
                            and then harfbuzz because in LibreOffice and
                            other software, there were existing
                            documents that referred to windings or other
                            symbol fonts and encoding characters in the
                            ASCII range. It's clear that if the symbol
                            font is asked by name, we should do the
                            shift. If it's NOT, then it should not be
                            chosen to render text to begin with, which
                            means the shift can be applied
                            unconditionally.<br>
                            <br>
                          </div>
                          <div>How does that sound?<br>
                          </div>
                          <div>behdad<br>
                          </div>
                          <div> </div>
                          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Thoughts?<br>
                            <br>
                            Thanks,<br>
                            Eric.<br>
                          </blockquote>
                          <div> </div>
                        </div>
                        -- <br>
                        <div class="m_4762254172372698478m_-4940559382455268948gmail_signature" data-smartmail="gmail_signature">behdad<br>
                          <a href="http://behdad.org/" target="_blank">http://behdad.org/</a></div>
                      </div>
                    </div>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <br>
        -- <br>
        <div class="m_4762254172372698478gmail_signature" data-smartmail="gmail_signature">behdad<br>
          <a href="http://behdad.org/" target="_blank">http://behdad.org/</a></div>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">behdad<br><a href="http://behdad.org/" target="_blank">http://behdad.org/</a></div>
</div>