[poppler] pdftohtml : enhancing it to use embedded fonts

Josh Richardson jric at chegg.com
Sun Jun 5 15:41:53 PDT 2011

The current systems appears to try and use system-available fonts as approximations for whatever font is in the PDF.  For pdftohtml, I am considering adding in a preferred behavior:

1.  Extract the original font from the PDF
2.  Create a font file for that font
3.  Reference the font file, using "@font-face" in the generated HTML.

This should give us an exact representation of the original font in the PDF, though it will only work with modern browsers, since earlier browsers don't support "@font-face".  For IE, I'll have to convert the font to EOT, and for the others I'll probably use regular OpenType (not TrueType) format.

If I only use the extracted font to display the original document in it's original form, and not to draw additional glyphs in any document, I believe I'll be in compliance with "fair use" and digital copyright rules for the font.

Does anyone see an issue with the approach, or have any advice?  For instance, I'm not sure how much luck I'll have with converting especially Type 3 fonts to OpenType/EOT.

Thanks, --josh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110605/5fbfd36e/attachment.htm>

More information about the poppler mailing list