<html> <head> <base href="https://bugs.documentfoundation.org/"> </head> <body> <div> <a class="bz_bug_link bz_status_NEW " title="NEW - No linebreak between Latin text and Ideographic punctuation" href="https://bugs.documentfoundation.org/show_bug.cgi?id=71329#c11">Comment # 11</a> on <a class="bz_bug_link bz_status_NEW " title="NEW - No linebreak between Latin text and Ideographic punctuation" href="https://bugs.documentfoundation.org/show_bug.cgi?id=71329">bug 71329</a> from <a class="email" href="mailto:marklh9@gmail.com" title="Mark Hung <marklh9@gmail.com>"> Mark Hung</a> <pre>(In reply to Mark Hung from <a href="show_bug.cgi?id=71329#c10">comment #10</a>) > This is still an issue in 5.3 - Phrases separated by full-width comma or > full-width dot are treated as one single word, hence it is put to the next > line. Correction: In [1] we detect the language of the last portion to determine the locale for the break iterator. The document under test has "en_US" there and the Unicode break iterator found the incorrect word boundary. There are few issues: 1. The heuristic rule is wrong in this case. 2. Unicode break iterator didn't break before ideographic punctuation. 3. The word breaking algorithm in UAX29[2] should work for us. Why do we need break iterators for three scripts? [1] <a href="https://cgit.freedesktop.org/libreoffice/core/tree/sw/source/core/text/guess.cxx#n355">https://cgit.freedesktop.org/libreoffice/core/tree/sw/source/core/text/guess.cxx#n355</a> [2] <a href="http://unicode.org/reports/tr29/#WB5">http://unicode.org/reports/tr29/#WB5</a></pre> </div> <hr> You are receiving this mail because: <ul> <li>You are the assignee for the bug.</li> </ul> </body> </html>