[Poppler-bugs] [Bug 96932] New: Improper text extraction from this pdf

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Thu Jul 14 15:02:20 UTC 2016


https://bugs.freedesktop.org/show_bug.cgi?id=96932

            Bug ID: 96932
           Summary: Improper text extraction from this pdf
           Product: poppler
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: blocker
          Priority: medium
         Component: utils
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: mingodad at gmail.com

Created attachment 125069
  --> https://bugs.freedesktop.org/attachment.cgi?id=125069&action=edit
A pdf with tables

Hello !
I'm testing pdftotxt with pdfs from
http://www.docidadesp.imprensaoficial.com.br and there is several of then that
seems to have mixed encodings (I gues) and outputs garbage for some of it's
content (PDFxStream do the same).
See the attached pdf for test.

I hope the attached example can help improve poppler.

Cheers !

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20160714/b7635105/attachment.html>


More information about the Poppler-bugs mailing list