[Libreoffice-bugs] [Bug 71329] No linebreak between Latin text and Ideographic punctuation

bugzilla-daemon at bugs.documentfoundation.org bugzilla-daemon at bugs.documentfoundation.org
Sat Dec 9 06:09:07 UTC 2017


https://bugs.documentfoundation.org/show_bug.cgi?id=71329

--- Comment #11 from Mark Hung <marklh9 at gmail.com> ---
(In reply to Mark Hung from comment #10)
> This is still an issue in 5.3 - Phrases separated by full-width comma or
> full-width dot are treated as one single word, hence it is put to the next
> line.

Correction:

In [1] we detect the language of the last portion to determine the locale for
the break iterator. The document under test has "en_US" there and the Unicode
break iterator found the incorrect word boundary.

There are few issues:
1. The heuristic rule is wrong in this case.
2. Unicode break iterator didn't break before ideographic punctuation.
3. The word breaking algorithm in UAX29[2] should work for us. Why do we need
break iterators for three scripts?

[1]
https://cgit.freedesktop.org/libreoffice/core/tree/sw/source/core/text/guess.cxx#n355
[2] http://unicode.org/reports/tr29/#WB5

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/libreoffice-bugs/attachments/20171209/345c7980/attachment.html>


More information about the Libreoffice-bugs mailing list