[Poppler-bugs] [Bug 91644] New: poppler-cpp text extraction drops characters

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sat Aug 15 08:04:25 PDT 2015


https://bugs.freedesktop.org/show_bug.cgi?id=91644

            Bug ID: 91644
           Summary: poppler-cpp text extraction drops characters
           Product: poppler
           Version: unspecified
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: cpp frontend
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: hpdeifel at gmx.de

Created attachment 117703
  --> https://bugs.freedesktop.org/attachment.cgi?id=117703&action=edit
PDF containing only the string "foobar"

The text extraction in poppler-cpp silently drops characters from the PDF.

To reproduce, compile and link the attached testcase and run it on the attached
PDF:

  ./txtextr foobar.pdf

The output should be

  foobar

as that is the only text in the PDF, but it is

  fooba

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20150815/8cce15e6/attachment.html>


More information about the Poppler-bugs mailing list