[poppler] PDTtoHTML and exported Fonts preservation

Matthew Blancarte matt at highlighter.com
Mon May 7 13:51:07 PDT 2012


In most cases, I definitely agree with you. However, my use case requires page-fidelity. :(

On May 7, 2012, at 1:20 PM, Ihar `Philips` Filipau wrote:

> On 5/7/12, Matthew Blancarte <matt at highlighter.com> wrote:
>> 
>> Yep, great point. I don't strip fonts from PDF's, exactly for this reason.
>> 
>> That said, it's really the only way to get consistent page-fidelity
>> conversion… at least that I've come up with.
>> 
> 
> Fidelity is overrated. Accessibility - underrated. Case in point: for
> adherents of PDF fidelity, I should still have around the PDF of Das
> Nibelungenlied authored in Fraktur to enjoy. :)
> 
> Harder, but more rewarding way is to detect structure of the
> text/images/etc in the PDF and try to convert them into corresponding
> presentation in HTML or whatever format. E.g. I have spent 2+ months
> writing heuristics to detect paragraph structure in my PDFs and can't
> be happier with the results.
> 
> Loss of fidelity is only problem when the result looks bad.
> 
> If result looks good, and is readable as the original PDF was, no soul
> would ever complain. What's more, features to change font, adjust font
> size and reflow text accordingly, scale up/down graphics and images -
> enabled by detected document structure - would trump fidelity any day.
> 
> My 0,02€



More information about the poppler mailing list