[PUSHED] fdo#53399 Word count is inconsistent and wrong with non-brea...
gerrit at gerrit.libreoffice.org
Wed Aug 22 08:23:08 PDT 2012
>From Andras Timar <atimar at suse.com>:
Andras Timar has submitted this change and it was merged.
Change subject: fdo#53399 Word count is inconsistent and wrong with non-breaking space
fdo#53399 Word count is inconsistent and wrong with non-breaking space
This change replaces lcl_IsSkippableWhitespace with a call to ICU's u_isspace, which covers all Unicode separators. It also updates and fixes one of the SwScanner unit tests.
SwScanner::NextWord skips whitespace before calling into ICU's BreakIterator. The function used to identify whitespace (lcl_IsSkippableWhitespace) doesn't cover the full category of Unicode separators (code [Zs], 18 in total. See: http://www.fileformat.info/info/unicode/category/Zs/index.htm).
Since 0xA0 (no-break space) is not identified as whitespace and not skipped, we end up calling ICU starting at the position 0xA0, asking it to get us the boundary of the next word forward. ICU sees that it's called at the end of a word, and reverses the query direction to backward, and returns the word before. This causes NextWord to think we've hit the end of the string and call it a day, terminating word count for the rest of the line.
2 files changed, 11 insertions(+), 12 deletions(-)
Andras Timar: Verified; Looks good to me, approved
To view, visit https://gerrit.libreoffice.org/453
To unsubscribe, visit https://gerrit.libreoffice.org/settings
Gerrit-Owner: Muhammad Haggag <mhaggag at gmail.com>
Gerrit-Reviewer: Andras Timar <atimar at suse.com>
More information about the LibreOffice