[Poppler-bugs] [Bug 97399] No word splitting for pdfs produced by Chrome

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Wed Sep 28 16:05:34 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=97399

--- Comment #2 from Jason Crain <jason at aquaticape.us> ---
Created attachment 126831
  --> https://bugs.freedesktop.org/attachment.cgi?id=126831&action=edit
[patch] Break words on all whitespace characters

Some PDF creators like Chrome use no-break spaces or other whitespace
characters between words, causing pdftotext -bbox to not break words as
expected.  Fix this by breaking words on any character with the Unicode
whitespace property.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20160928/704ccc08/attachment.html>


More information about the Poppler-bugs mailing list