[Poppler-bugs] [Bug 89941] New: pdftotext: Add an option for more detailed bounding box information

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Apr 7 10:58:15 PDT 2015


https://bugs.freedesktop.org/show_bug.cgi?id=89941

            Bug ID: 89941
           Summary: pdftotext: Add an option for more detailed bounding
                    box information
           Product: poppler
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: medium
         Component: utils
          Assignee: poppler-bugs at lists.freedesktop.org
          Reporter: jechols at uoregon.edu

Created attachment 114932
  --> https://bugs.freedesktop.org/attachment.cgi?id=114932&action=edit
Adds -bbox-layout command to pdftotext

We're looking to generate ALTO-compatible XML
(http://en.wikipedia.org/wiki/ALTO_%28XML%29) from PDFs, and the current -bbox
flag almost does what we need, but skips over some important data - blocks and
lines.

I have created some code based on 0.22.5 (in order to ensure compatibility on
our CentOS 7 system) which appears to apply cleanly to the current master, and
produces the same output as my 0.22.5 hack as far as I can tell.  The change
adds a new flag, -bbox-layout, which is still very generic output, but is
sufficient for us to then transform as needed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20150407/78be4b0a/attachment-0001.html>


More information about the Poppler-bugs mailing list