[Poppler-bugs] [Bug 103309] pdftotext: UTF-16 text without BOM not properly extracted

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Oct 17 10:38:01 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=103309

--- Comment #1 from ralf.stubner at r-institute.com ---
Additional note:

$ java -jar pdfbox-app-2.0.7.jar ExtractText 2004.pdf

Extracts the text but issues some warnings:

Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNUNG: Invalid ToUnicode CMap in font JRLFSC+Segoe UI,Bold-Identity-H
Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNUNG: Invalid ToUnicode CMap in font EUPBOV+Arial Unicode MS-Identity-H
Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNUNG: Invalid ToUnicode CMap in font VRSAOT+Arial Unicode MS,Bold-Identity-H
Okt 17, 2017 12:34:44 PM org.apache.pdfbox.pdmodel.font.PDFont <init>
WARNUNG: Invalid ToUnicode CMap in font FAMOVB+Segoe UI-Identity-H

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20171017/30796e08/attachment.html>


More information about the Poppler-bugs mailing list