[poppler] pdftotext line/block information

obsidian . obsidian9993 at gmail.com
Mon May 7 23:26:08 UTC 2018


This is a bug:

https://bugs.freedesktop.org/show_bug.cgi?id=93344

The patch in that thread does not work with the latest poppler version
(0.64.0), I just tried.
Any ideas?

On Thu, May 3, 2018 at 1:09 PM, obsidian . <obsidian9993 at gmail.com> wrote:

> I'm using pdftotext (version 0.55 on Windows).
> pdftotext -bbox works fine and it gets me the dump of all words of all
> pages of the input pdf.
>
> pdftotext -bbox-layout just gives me the dump of all words (plus blocks
> and lines) of just the first page of the pdf. The rest of the pages are
> empty.
> Is this a bug?
>
> Here's an example output of pdftotext -bbox-layout file.pdf:
>
> <doc> <page width="612.000000" height="792.000000"> <flow> <block
> xMin="114.260000" yMin="553.390000" xMax="497.054000" yMax="589.390000">
> <line xMin="114.260000" yMin="553.390000" xMax="497.054000"
> yMax="589.390000"> <word xMin="114.260000" yMin="553.390000"
> xMax="184.748000" yMax="589.390000">foo</word> <word xMin="192.884000"
> yMin="553.390000" xMax="201.956000" yMax="589.390000">some</word> <word
> xMin="210.200000" yMin="553.390000" xMax="256.496000"
> yMax="589.390000">words</word> <word xMin="264.770000" yMin="553.390000"
> xMax="275.786000" yMax="589.390000">here</word> <word xMin="283.970000"
> yMin="553.390000" xMax="317.702000" yMax="589.390000">and</word> <word
> xMin="325.838000" yMin="553.390000" xMax="497.054000"
> yMax="589.390000">there</word> </line> </block> <block xMin="268.370000"
> yMin="622.810000" xMax="364.384640" yMax="650.770000"> <line
> xMin="268.370000" yMin="622.810000" xMax="364.384640" yMax="650.770000">
> <word xMin="268.370000" yMin="622.810000" xMax="301.362800"
> yMax="650.770000">one</word> <word xMin="307.681760" yMin="622.810000"
> xMax="364.384640" yMax="650.770000">two</word> </line> </block> </flow>
> </page> <page width="612.000000" height="792.000000"> </page> <page
> width="612.000000" height="792.000000"> </page> <page width="612.000000"
> height="792.000000"> </page> <page width="612.000000" height="792.000000">
> </page> <page width="612.000000" height="792.000000"> </page> <page
> width="612.000000" height="792.000000"> </page> <page width="612.000000"
> height="792.000000"> </page> <page width="612.000000" height="792.000000">
> </page> <page width="612.000000" height="792.000000"> </page> <page
> width="612.000000" height="792.000000"> </page> <page width="612.000000"
> height="792.000000"> </page> </doc>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180508/e9d2303a/attachment-0001.html>


More information about the poppler mailing list