[poppler] pdftotext line/block information

obsidian . obsidian9993 at gmail.com
Thu May 3 10:09:28 UTC 2018


I'm using pdftotext (version 0.55 on Windows).
pdftotext -bbox works fine and it gets me the dump of all words of all
pages of the input pdf.

pdftotext -bbox-layout just gives me the dump of all words (plus blocks and
lines) of just the first page of the pdf. The rest of the pages are empty.
Is this a bug?

Here's an example output of pdftotext -bbox-layout file.pdf:

<doc> <page width="612.000000" height="792.000000"> <flow> <block
xMin="114.260000" yMin="553.390000" xMax="497.054000" yMax="589.390000">
<line xMin="114.260000" yMin="553.390000" xMax="497.054000"
yMax="589.390000"> <word xMin="114.260000" yMin="553.390000"
xMax="184.748000" yMax="589.390000">foo</word> <word xMin="192.884000"
yMin="553.390000" xMax="201.956000" yMax="589.390000">some</word> <word
xMin="210.200000" yMin="553.390000" xMax="256.496000"
yMax="589.390000">words</word> <word xMin="264.770000" yMin="553.390000"
xMax="275.786000" yMax="589.390000">here</word> <word xMin="283.970000"
yMin="553.390000" xMax="317.702000" yMax="589.390000">and</word> <word
xMin="325.838000" yMin="553.390000" xMax="497.054000"
yMax="589.390000">there</word> </line> </block> <block xMin="268.370000"
yMin="622.810000" xMax="364.384640" yMax="650.770000"> <line
xMin="268.370000" yMin="622.810000" xMax="364.384640" yMax="650.770000">
<word xMin="268.370000" yMin="622.810000" xMax="301.362800"
yMax="650.770000">one</word> <word xMin="307.681760" yMin="622.810000"
xMax="364.384640" yMax="650.770000">two</word> </line> </block> </flow>
</page> <page width="612.000000" height="792.000000"> </page> <page
width="612.000000" height="792.000000"> </page> <page width="612.000000"
height="792.000000"> </page> <page width="612.000000" height="792.000000">
</page> <page width="612.000000" height="792.000000"> </page> <page
width="612.000000" height="792.000000"> </page> <page width="612.000000"
height="792.000000"> </page> <page width="612.000000" height="792.000000">
</page> <page width="612.000000" height="792.000000"> </page> <page
width="612.000000" height="792.000000"> </page> <page width="612.000000"
height="792.000000"> </page> </doc>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180503/9144b8cb/attachment-0001.html>


More information about the poppler mailing list