[poppler] alternatives to pdftohtml to extract text with formatting
martin at oneiros.de
Fri Apr 20 04:39:46 PDT 2012
2012/4/20 Ihar `Philips` Filipau <thephilips at gmail.com>:
> That stuff is too new to be broadly available. Anyway, I'm stuck with
> PDFs created in end 90s, beginning 2000s.
Then you can only do some kind of OCR. :-)
> Just tested with LibreOffice 3.5.2 & Okular 0.13.3 on Linux - no
> effect: bold and italics are lost during copy-paste.
I didn't say that Okular can handle tagged pdf.
Anyway styles like "bold" and "italic" are outside the scope of tagged pdf.
More information about the poppler