[poppler] Tagged PDF (was Re: alternatives to pdftohtml to extract text with formatting)

Leonard Rosenthol lrosenth at adobe.com
Fri Apr 20 00:04:16 PDT 2012

On 4/20/12 1:26 AM, "Ihar `Philips` Filipau" <thephilips at gmail.com> wrote:
>What that means - "properly tagged"?

Meaning that the PDF has it's content tagged or structured to provide
semantic richness, and not just a bunch of drawing instructions.   See
section 14 (IIRC) of ISO 32000.

>Or probably other away around: which producers create "properly tagged"

When you create PDF directly from Adobe applications (eg. InDesign or
FrameMaker), use the PDFMakers provided with Acrobat inside of MSOffice,
use the native PDF export features of Office 2007 (and later) or even use
applications such as OpenOffice or LibreOffice, and choose the appropriate
settings - you will get tagged PDF.


More information about the poppler mailing list