<html>
<head>
<base href="https://bugs.freedesktop.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - [PATCH] try to detect line breaks in the PDF and insert them in raw mode for pdftotext"
href="https://bugs.freedesktop.org/show_bug.cgi?id=62266#c9">Comment # 9</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW --- - [PATCH] try to detect line breaks in the PDF and insert them in raw mode for pdftotext"
href="https://bugs.freedesktop.org/show_bug.cgi?id=62266">bug 62266</a>
from <span class="vcard"><a class="email" href="mailto:jamslam@gmail.com" title="Andrew Gallant <jamslam@gmail.com>"> <span class="fn">Andrew Gallant</span></a>
</span></b>
<pre><span class="quote">> it is just an assumption that if two characters are separated enough one from the other, there is a space in the middle</span >
It is more than that. As I said:
<span class="quote">> For example, the current code inserts a new line whenever the next word is detected to not be in the same line as the current word</span >
The raw text isn't just having spaces added, but it is also getting new lines
added whenever the vertical space between the current word and the next word
exceeds the `maxIntraLineDelta` constant.
My patch is a very small extension of this sort of logic: add an additional new
line when the vertical space between the current word and next word exceeds the
`maxLineSpacingDelta` constant.
I don't think my patch makes any additional assumptions beyond the assumptions
already made by the code.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>