[Poppler-bugs] [Bug 103309] pdftotext: UTF-16 text without BOM not properly extracted
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Tue Oct 17 10:38:01 UTC 2017
https://bugs.freedesktop.org/show_bug.cgi?id=103309
--- Comment #1 from ralf.stubner at r-institute.com ---
Additional note:
$ java -jar pdfbox-app-2.0.7.jar ExtractText 2004.pdf
Extracts the text but issues some warnings:
Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNUNG: Invalid ToUnicode CMap in font JRLFSC+Segoe UI,Bold-Identity-H
Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNUNG: Invalid ToUnicode CMap in font EUPBOV+Arial Unicode MS-Identity-H
Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNUNG: Invalid ToUnicode CMap in font VRSAOT+Arial Unicode MS,Bold-Identity-H
Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNUNG: Invalid ToUnicode CMap in font FAMOVB+Segoe UI-Identity-H
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20171017/30796e08/attachment.html>
More information about the Poppler-bugs
mailing list