[Poppler-bugs] [Bug 62266] New: [PATCH] try to detect line breaks in the PDF and insert them in raw mode for pdftotext

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Mar 12 15:42:05 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=62266

          Priority: medium
            Bug ID: 62266
          Assignee: poppler-bugs at lists.freedesktop.org
           Summary: [PATCH] try to detect line breaks in the PDF and
                    insert them in raw mode for pdftotext
          Severity: enhancement
    Classification: Unclassified
                OS: All
          Reporter: jamslam at gmail.com
          Hardware: All
            Status: NEW
           Version: unspecified
         Component: utils
           Product: poppler

Created attachment 76449
  --> https://bugs.freedesktop.org/attachment.cgi?id=76449&action=edit
Adds parabrk option to pdftotext

Adds the parabrk option to `pdftotext`.

The parabrk option is only applicable to raw mode, and attempts to insert an 
additional new line character wherever one can be detected in the PDF. It is 
intended to separate paragraphs when they are separated by vertical whitespace
in the PDF.

It isn't perfect, for instance, it doesn't handle page boundaries.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130312/2b00a4f2/attachment.html>


More information about the Poppler-bugs mailing list