[poppler] PDTtoHTML and exported Fonts preservation
Matthew Blancarte
matt at highlighter.com
Mon May 7 13:51:07 PDT 2012
In most cases, I definitely agree with you. However, my use case requires page-fidelity. :(
On May 7, 2012, at 1:20 PM, Ihar `Philips` Filipau wrote:
> On 5/7/12, Matthew Blancarte <matt at highlighter.com> wrote:
>>
>> Yep, great point. I don't strip fonts from PDF's, exactly for this reason.
>>
>> That said, it's really the only way to get consistent page-fidelity
>> conversion… at least that I've come up with.
>>
>
> Fidelity is overrated. Accessibility - underrated. Case in point: for
> adherents of PDF fidelity, I should still have around the PDF of Das
> Nibelungenlied authored in Fraktur to enjoy. :)
>
> Harder, but more rewarding way is to detect structure of the
> text/images/etc in the PDF and try to convert them into corresponding
> presentation in HTML or whatever format. E.g. I have spent 2+ months
> writing heuristics to detect paragraph structure in my PDFs and can't
> be happier with the results.
>
> Loss of fidelity is only problem when the result looks bad.
>
> If result looks good, and is readable as the original PDF was, no soul
> would ever complain. What's more, features to change font, adjust font
> size and reflow text accordingly, scale up/down graphics and images -
> enabled by detected document structure - would trump fidelity any day.
>
> My 0,02€
More information about the poppler
mailing list