<html>
    <head>
      <base href="https://bugs.documentfoundation.org/">
    </head>
    <body>
      <p>
        <div>
            <b><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - No linebreak between Latin text and Ideographic punctuation"
   href="https://bugs.documentfoundation.org/show_bug.cgi?id=71329#c11">Comment # 11</a>
              on <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - No linebreak between Latin text and Ideographic punctuation"
   href="https://bugs.documentfoundation.org/show_bug.cgi?id=71329">bug 71329</a>
              from <span class="vcard"><a class="email" href="mailto:marklh9@gmail.com" title="Mark Hung <marklh9@gmail.com>"> <span class="fn">Mark Hung</span></a>
</span></b>
        <pre>(In reply to Mark Hung from <a href="show_bug.cgi?id=71329#c10">comment #10</a>)
<span class="quote">> This is still an issue in 5.3 - Phrases separated by full-width comma or
> full-width dot are treated as one single word, hence it is put to the next
> line.</span >

Correction:

In [1] we detect the language of the last portion to determine the locale for
the break iterator. The document under test has "en_US" there and the Unicode
break iterator found the incorrect word boundary.

There are few issues:
1. The heuristic rule is wrong in this case.
2. Unicode break iterator didn't break before ideographic punctuation.
3. The word breaking algorithm in UAX29[2] should work for us. Why do we need
break iterators for three scripts?

[1]
<a href="https://cgit.freedesktop.org/libreoffice/core/tree/sw/source/core/text/guess.cxx#n355">https://cgit.freedesktop.org/libreoffice/core/tree/sw/source/core/text/guess.cxx#n355</a>
[2] <a href="http://unicode.org/reports/tr29/#WB5">http://unicode.org/reports/tr29/#WB5</a></pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>