[Poppler-bugs] [Bug 103798] libpoppler cannot recreate pdftotext output, because physical_layout is not handled correctly

bugzilla-daemon at freedesktop.org bugzilla-daemon at freedesktop.org
Sun Nov 19 18:06:40 UTC 2017


https://bugs.freedesktop.org/show_bug.cgi?id=103798

--- Comment #8 from dummydummy at gmx.fr ---

When a TextOutputDev is created with a NULL first argument for output filename,
then a call to doc->displayPage(with such a TextOutputDev as first argument)
will not generate any output. Instead TextOutputDev->getText () will attempt to
assemble the fragments resulting from parsing the top level PDF object during
doc->displayPage(...) approximately matching the correct physical layout.

This is what happens in libpoppler, but the results differ from those produced
by pdftotext.

pdftotext creates a TextOutputDev with a (non-NULL) first argument for the
output filename. In this case, a call to doc->displayPage(with such a
TextOutputDev as first argument) will generate output to the filename (possibly
via Gfx->display () ?).

The poppler code thus appears to have two routines which are not quite
duplicates of one another for the same purpose of producing the text disposed
according to the physical layout in a string variable!? 

Such a (historical?) architecture is just a recipe for problems.

Is someone attempting to fix this? (This could be a major job)

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/poppler-bugs/attachments/20171119/5644cad6/attachment.html>


More information about the Poppler-bugs mailing list