[Poppler-bugs] [Bug 97399] No word splitting for pdfs produced by Chrome

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Aug 18 19:33:35 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=97399

--- Comment #1 from Jason Crain <jason at aquaticape.us> ---
I was mistaken on IRC when I called this a linefeed character.  I confused 0xA0
and 0x0A.  Chrome is for some reason sometimes using 0xA0 (no-break space)
between words.  poppler only breaks words on regular 0x20 space so these stay
grouped together in the same word.  To work around this, we could possibly
implement something like icu's u_isUWhiteSpace to check for characters to split
on.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20160818/ad01fb03/attachment.html>


More information about the Poppler-bugs mailing list