[poppler] PDTtoHTML and exported Fonts preservation

Ihar `Philips` Filipau thephilips at gmail.com
Mon May 7 13:20:06 PDT 2012


On 5/7/12, Matthew Blancarte <matt at highlighter.com> wrote:
>
> Yep, great point. I don't strip fonts from PDF's, exactly for this reason.
>
> That said, it's really the only way to get consistent page-fidelity
> conversion… at least that I've come up with.
>

Fidelity is overrated. Accessibility - underrated. Case in point: for
adherents of PDF fidelity, I should still have around the PDF of Das
Nibelungenlied authored in Fraktur to enjoy. :)

Harder, but more rewarding way is to detect structure of the
text/images/etc in the PDF and try to convert them into corresponding
presentation in HTML or whatever format. E.g. I have spent 2+ months
writing heuristics to detect paragraph structure in my PDFs and can't
be happier with the results.

Loss of fidelity is only problem when the result looks bad.

If result looks good, and is readable as the original PDF was, no soul
would ever complain. What's more, features to change font, adjust font
size and reflow text accordingly, scale up/down graphics and images -
enabled by detected document structure - would trump fidelity any day.

My 0,02€


More information about the poppler mailing list