[poppler] alternatives to pdftohtml to extract text with formatting
Martin Schröder
martin at oneiros.de
Thu Apr 19 23:32:29 PDT 2012
2012/4/20 Ihar `Philips` Filipau <thephilips at gmail.com>:
> What that means - "properly tagged"?
Conforming to PDF/A-1a. or PDF/UA.
See Section 14.8 of 32000-1:2008.
https://en.wikipedia.org/wiki/PDF#Logical_structure_and_accessibility
> Or probably other away around: which producers create "properly tagged" PDFs?
AFAIK LibreOffice, Word, ConTeXt can do that.
Best
Martin
More information about the poppler
mailing list