[poppler] alternatives to pdftohtml to extract text with formatting

Martin Schröder martin at oneiros.de
Thu Apr 19 23:32:29 PDT 2012

2012/4/20 Ihar `Philips` Filipau <thephilips at gmail.com>:
> What that means - "properly tagged"?

Conforming to PDF/A-1a. or PDF/UA.
See Section 14.8 of 32000-1:2008.

> Or probably other away around: which producers create "properly tagged" PDFs?

AFAIK LibreOffice, Word, ConTeXt can do that.


More information about the poppler mailing list