[Poppler-bugs] [Bug 62266] [PATCH] try to detect line breaks in the PDF and insert them in raw mode for pdftotext
bugzilla-daemon at freedesktop.org
bugzilla-daemon at freedesktop.org
Thu Mar 21 15:55:08 PDT 2013
https://bugs.freedesktop.org/show_bug.cgi?id=62266
--- Comment #9 from Andrew Gallant <jamslam at gmail.com> ---
> it is just an assumption that if two characters are separated enough one from the other, there is a space in the middle
It is more than that. As I said:
> For example, the current code inserts a new line whenever the next word is detected to not be in the same line as the current word
The raw text isn't just having spaces added, but it is also getting new lines
added whenever the vertical space between the current word and the next word
exceeds the `maxIntraLineDelta` constant.
My patch is a very small extension of this sort of logic: add an additional new
line when the vertical space between the current word and next word exceeds the
`maxLineSpacingDelta` constant.
I don't think my patch makes any additional assumptions beyond the assumptions
already made by the code.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20130321/2d49d8b0/attachment.html>
More information about the Poppler-bugs
mailing list