[poppler] Regression in text extraction

Adrian Johnson ajohnson at redneon.com
Sun Jun 29 04:53:41 PDT 2008


The following commit introduced a regression in text extraction from PDF 
files that use ActualText:

    commit 2da15db4751d3cb93d40b48e348dbc51f6e7a29f
    Author: Carlos Garcia Campos <carlosgc at gnome.org>
    Date:   Fri Jun 20 11:39:08 2008 +0200

        Do not create an OCGs object if there isn't an OCProperties
        dictionary in the Catalog

The problem is the code added to Gfx::opBeginMarkedContent() that exits 
the function before beginMarkedContent() in the TextOuputDev is called. 
Gfx::opEndMarkedContent() also has the same problem.




More information about the poppler mailing list