<div dir="ltr">Sorry, no progress so far. But for tracking purposes:<br><a href="https://github.com/harfbuzz/harfbuzz/issues/1011">https://github.com/harfbuzz/harfbuzz/issues/1011</a><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Jan 20, 2018 at 6:22 PM, Eric Muller <span dir="ltr"><<a href="mailto:emuller@amazon.com" target="_blank">emuller@amazon.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div text="#000000" bgcolor="#FFFFFF">
    <div class="m_-3592440659313515916moz-cite-prefix"><span class="">
      <blockquote type="cite">The easiest would be to add a new API
        analogous to hb_ot_font_set_funcs(), that does NOT have the
        symbol shift in it</blockquote></span>
      That works.<br>
      <br>
      Thanks,<br>
      Eric.<div><div class="h5"><br>
      <br>
      <br>
      On 1/19/18 4:43 PM, Behdad Esfahbod wrote:<br>
    </div></div></div><div><div class="h5">
    <blockquote type="cite">
      
      <div dir="ltr">
        <div>
          <div>Ok, let's see how we can address this...<br>
            <br>
          </div>
          I don't like a setting on the buffer as currently the
          get_glyph() callback has no way of accessing that
          information.  The easiest would be to add a new API analogous
          to hb_ot_font_set_funcs(), that does NOT have the symbol shift
          in it.  It's not the most elegant solution but easiest.  Would
          that work for you?<br>
          <br>
        </div>
        That said, this issue is also related, as it pertains another
        non-Unicode encoding, though, in the font not the buffer:<br>
        <br>
          <a href="https://github.com/harfbuzz/harfbuzz/issues/681" target="_blank">https://github.com/harfbuzz/<wbr>harfbuzz/issues/681</a><br>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Thu, Jan 18, 2018 at 11:27 PM, Eric
          Muller <span dir="ltr"><<a href="mailto:emuller@amazon.com" target="_blank">emuller@amazon.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div text="#000000" bgcolor="#FFFFFF">
              <div class="m_-3592440659313515916m_4762254172372698478moz-cite-prefix">I want
                to build a rendering system where U+0041 renders as an
                "A", regardless of the selected font.<span class="m_-3592440659313515916HOEnZb"><font color="#888888"><br>
                    <br>
                    Eric.</font></span>
                <div>
                  <div class="m_-3592440659313515916h5"><br>
                    <br>
                    <br>
                    On 1/17/18 3:48 PM, Behdad Esfahbod wrote:<br>
                  </div>
                </div>
              </div>
              <div>
                <div class="m_-3592440659313515916h5">
                  <blockquote type="cite">
                    <div dir="ltr">What's the actual problem you are
                      facing?<br>
                    </div>
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">On Mon, Jan 15, 2018 at
                        9:58 AM, Eric Muller <span dir="ltr"><<a href="mailto:emuller@amazon.com" target="_blank">emuller@amazon.com</a>></span>
                        wrote:<br>
                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                          <div text="#000000" bgcolor="#FFFFFF">
                            <div class="m_-3592440659313515916m_4762254172372698478m_-4940559382455268948moz-cite-prefix"><span><br>
                                <blockquote type="cite">It's clear that
                                  if the symbol font is asked by name,
                                  we should do the shift.</blockquote>
                              </span> I think I disagree, in the sense
                              that HB should not impose that behavior on
                              it's clients. HB is clearly the right
                              place to implement the behavior, but the
                              choice of having that behavior or not
                              should be with the client.<br>
                              <br>
                              For any document format, rendering the
                              moral equivalent of <p
                              font-family='symbol'>&#x0041;<<wbr>/p>
                              with something else that an "A" implies
                              that all ASCII is PUA. That's a choice
                              Word, InDesign, Notepad may make if they
                              want, but it should not be imposed on all
                              users of HB. <br>
                              <br>
                              Personally, I think it is a very bad
                              choice for HTML, and Firefox seems to
                              agree. It seems nice and user friendly at
                              first, but this makes the document
                              ambiguous. What about <p
                              font-family='minion,
                              symbol'>&#x0041;</p>? It's an
                              A or not an A depending on the presence of
                              "minion" in the client. What does the
                              document mean?<br>
                              <br>
                              Of course, <p
                              font-family='symbol'>&#xF041;<<wbr>/p>
                              should render with the glyph
                              symbol.cmap(F041). So even if the shift is
                              never done, the glyph is usable. It's just
                              that you don't have the convenience of an
                              IME-like mechanism provided by the shaping
                              engine, but you gain a reliable semantic
                              for the text.<br>
                              <br>
                              <blockquote type="cite">That's good
                                behavior [in Word], but beyond what
                                HarfBuzz can do.</blockquote>
                              Yes, which is why the shift may be
                              acceptable or even desirable for some
                              clients, and so hopefully the client could
                              choose.<span><br>
                                <br>
                                <blockquote type="cite">What would
                                  clients do with that control then? How
                                  would they set it?</blockquote>
                              </span> If I build an app that is meant to
                              work like other GDI apps, I allow the
                              shift (and may be add mitigating measures
                              like Word). If I build an app such as
                              Firefox, I don't allow it. The choice is
                              entirely driven by the type application I
                              want to build, and how I want to define my
                              document format.<br>
                              <br>
                              <br>
                              If you were to implement this choice, I
                              can see it either in the construction of
                              the HB unicode functions, or in the
                              hb_buffer (either globally, or one a
                              character by character basis). I have a
                              preference for the latter: this choice
                              could be passed down to the cmap lookup
                              functions, HB or not; it could also be
                              different on different parts of a
                              document, may be reacting to markup.<span class="m_-3592440659313515916m_4762254172372698478HOEnZb"><font color="#888888"><br>
                                  <br>
                                  Eric.</font></span>
                              <div>
                                <div class="m_-3592440659313515916m_4762254172372698478h5"><br>
                                  <br>
                                  <br>
                                  On 1/15/18 6:46 AM, Behdad Esfahbod
                                  wrote:<br>
                                </div>
                              </div>
                            </div>
                            <div>
                              <div class="m_-3592440659313515916m_4762254172372698478h5">
                                <blockquote type="cite">
                                  <div dir="ltr">
                                    <div class="gmail_extra">Hi Eric,<br>
                                      <br>
                                    </div>
                                    <div class="gmail_extra">
                                      <div class="gmail_quote">On Mon,
                                        Jan 15, 2018 at 2:25 AM, Eric
                                        Muller <span dir="ltr"><<a href="mailto:emuller@amazon.com" target="_blank">emuller@amazon.com</a>></span>
                                        wrote:<br>
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">It
                                          seems that with a font that
                                          has only a 3, 0 cmap subtable
                                          (and may be some macintosh
                                          subtables), then HB will
                                          automatically do the shift by
                                          F000 (in the function
                                          get_glyph_from_symbol) for
                                          code points below U+00FF that
                                          are not mapped by the
                                          subtable.<br>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>Right. Only in hb-ot-func
                                          though. Client font funcs can
                                          do otherwise.<br>
                                          <br>
                                        </div>
                                        <div> </div>
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> It is
                                          clear that when U+0041 A is
                                          set with a symbol font, then
                                          that U+0041 has actually the
                                          semantics of a PUA code point,
                                          and certainly should not be
                                          treated as an "A". That's the
                                          whole point of a 3,0 cmap
                                          subtable.<br>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>Correct.<br>
                                           <br>
                                        </div>
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                          Consider an HTML page. The
                                          font-family is only a request
                                          and there is no guarantee that
                                          the actual font will or will
                                          not be a symbol font. Thus the
                                          semantic of the HTML page can
                                          change depending on the
                                          browser environment. Outside a
                                          browser, it seems that the
                                          safe treatment is therefore to
                                          consider all code points below
                                          U+00FF as PUA, which is
                                          clearly not tenable. So in
                                          that environment, I think that
                                          the shift should not be done.
                                          Of course, U+F041 should work.<br>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>My take on this is that
                                          it's a bug of the font
                                          fallback logic if it falls
                                          back to a symbol font.  I
                                          changed fontconfig to never do
                                          that.<br>
                                           </div>
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Note
                                          that behavior of Word 2016 on
                                          Windows is actually more
                                          elaborate: enter U+0041, and
                                          set it with a non-symbol font;
                                          copy/paste or save to a text
                                          file, and the result is
                                          U+0041; but set this A in a
                                          symbol font, and copy/paste or
                                          save to a text file, and the
                                          result is U+F041.<br>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>That's good behavior, but
                                          beyond what HarfBuzz can do.<br>
                                           <br>
                                        </div>
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I
                                          think that the shift should be
                                          controllable by the client,
                                          rather than systematically
                                          applied. I don't have a strong
                                          opinion about the default
                                          behavior (i.e. when HB's
                                          client does not specify
                                          whether the shift should be
                                          done or not).<br>
                                        </blockquote>
                                        <div><br>
                                        </div>
                                        <div>What would clients do with
                                          that control then? How would
                                          they set it?<br>
                                          <br>
                                        </div>
                                        <div>I implemented this shift in
                                          fontconfig and then harfbuzz
                                          because in LibreOffice and
                                          other software, there were
                                          existing documents that
                                          referred to windings or other
                                          symbol fonts and encoding
                                          characters in the ASCII range.
                                          It's clear that if the symbol
                                          font is asked by name, we
                                          should do the shift. If it's
                                          NOT, then it should not be
                                          chosen to render text to begin
                                          with, which means the shift
                                          can be applied
                                          unconditionally.<br>
                                          <br>
                                        </div>
                                        <div>How does that sound?<br>
                                        </div>
                                        <div>behdad<br>
                                        </div>
                                        <div> </div>
                                        <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
                                          Thoughts?<br>
                                          <br>
                                          Thanks,<br>
                                          Eric.<br>
                                        </blockquote>
                                        <div> </div>
                                      </div>
                                      -- <br>
                                      <div class="m_-3592440659313515916m_4762254172372698478m_-4940559382455268948gmail_signature" data-smartmail="gmail_signature">behdad<br>
                                        <a href="http://behdad.org/" target="_blank">http://behdad.org/</a></div>
                                    </div>
                                  </div>
                                </blockquote>
                                <br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                      <br clear="all">
                      <br>
                      -- <br>
                      <div class="m_-3592440659313515916m_4762254172372698478gmail_signature" data-smartmail="gmail_signature">behdad<br>
                        <a href="http://behdad.org/" target="_blank">http://behdad.org/</a></div>
                    </div>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
        <br clear="all">
        <br>
        -- <br>
        <div class="m_-3592440659313515916gmail_signature" data-smartmail="gmail_signature">behdad<br>
          <a href="http://behdad.org/" target="_blank">http://behdad.org/</a></div>
      </div>
    </blockquote>
    <br>
  </div></div></div>

</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">behdad<br><a href="http://behdad.org/" target="_blank">http://behdad.org/</a></div>
</div>