<html>
<head>
<base href="https://bugs.documentfoundation.org/">
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - No linebreak between Latin text and Ideographic punctuation"
href="https://bugs.documentfoundation.org/show_bug.cgi?id=71329#c11">Comment # 11</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - No linebreak between Latin text and Ideographic punctuation"
href="https://bugs.documentfoundation.org/show_bug.cgi?id=71329">bug 71329</a>
from <span class="vcard"><a class="email" href="mailto:marklh9@gmail.com" title="Mark Hung <marklh9@gmail.com>"> <span class="fn">Mark Hung</span></a>
</span></b>
<pre>(In reply to Mark Hung from <a href="show_bug.cgi?id=71329#c10">comment #10</a>)
<span class="quote">> This is still an issue in 5.3 - Phrases separated by full-width comma or
> full-width dot are treated as one single word, hence it is put to the next
> line.</span >
Correction:
In [1] we detect the language of the last portion to determine the locale for
the break iterator. The document under test has "en_US" there and the Unicode
break iterator found the incorrect word boundary.
There are few issues:
1. The heuristic rule is wrong in this case.
2. Unicode break iterator didn't break before ideographic punctuation.
3. The word breaking algorithm in UAX29[2] should work for us. Why do we need
break iterators for three scripts?
[1]
<a href="https://cgit.freedesktop.org/libreoffice/core/tree/sw/source/core/text/guess.cxx#n355">https://cgit.freedesktop.org/libreoffice/core/tree/sw/source/core/text/guess.cxx#n355</a>
[2] <a href="http://unicode.org/reports/tr29/#WB5">http://unicode.org/reports/tr29/#WB5</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>