[Poppler-bugs] [Bug 62266] [PATCH] try to detect line breaks in the PDF and insert them in raw mode for pdftotext

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Tue Mar 19 16:23:43 PDT 2013


https://bugs.freedesktop.org/show_bug.cgi?id=62266

--- Comment #2 from Andrew Gallant <jamslam at gmail.com> ---
Perhaps the option is ill-named. What it's really doing is trying to insert a
single new line whenever one or more can be detected in the PDF (as defined by
an amount of white space greater than the line spacing). I think this would
fall under the category "raw" mode.

I chose the name because the intended use case of identifying vertical white
space in the PDF is to translate that white space into the raw text generated.
Usually this results in a separation of paragraphs that are also separated by
vertical white space in the PDF.

The actual need is an attempt to output raw text with respect to the PDF as
faithfully as possible. It's quite nice to get raw text that has line breaks
wherever they were found in the PDF.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130319/f05aaba6/attachment.html>


More information about the Poppler-bugs mailing list