[poppler] Regression in text extraction
Albert Astals Cid
aacid at kde.org
Sun Jun 29 06:49:32 PDT 2008
A Diumenge 29 Juny 2008, Adrian Johnson va escriure:
> The following commit introduced a regression in text extraction from PDF
> files that use ActualText:
> commit 2da15db4751d3cb93d40b48e348dbc51f6e7a29f
> Author: Carlos Garcia Campos <carlosgc at gnome.org>
> Date: Fri Jun 20 11:39:08 2008 +0200
> Do not create an OCGs object if there isn't an OCProperties
> dictionary in the Catalog
> The problem is the code added to Gfx::opBeginMarkedContent() that exits
> the function before beginMarkedContent() in the TextOuputDev is called.
> Gfx::opEndMarkedContent() also has the same problem.
Right, the attached patch should fix the problem, can you test?
Also can you please send an url to a pdf where ActualText gives a different
output than "classical" text extraction?
> poppler mailing list
> poppler at lists.freedesktop.org
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 1230 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20080629/5a319b02/attachment.patch
More information about the poppler