[poppler] PDTtoHTML and exported Fonts preservation

James Ford jford at psyked.co.uk
Wed May 2 02:27:12 PDT 2012


Hi guys,

I've recently found out about Poppler and am looking to use it to convert
PDF files to a HTML format. The output so far looks great - positioning,
sizing, colours and images, etc, but the fonts are letting me down.  All of
the custom fonts, like "Impact" and "Myriad Pro" are coming out as "Times"
in the output HTML.

I've seen how Scribd seems to use a version of Poppler to do its PDF
conversion, and it looks like they're base64-encoding the font data in
their output. I'd love to be able to do the same, but it looks like this
isn't part of the master version of Poppler. I've seen some previous posts
and bug reports that talk about font embedding [
https://bugs.freedesktop.org/show_bug.cgi?id=39385] but I've tried and
failed to apply the patches and get such a feature working on my own build.

Can anyone point me in the right direction for this? I'm assuming that if I
could apply these patches, I'd be able to build a version of Poppler that
can embed the correct fonts in the output HTML. I'm at the point now where
I'm going to try and manually repeat the actions detailed in the patches
because I can't get git to apply the patches automatically. Am I going down
the wrong path here?

In addition to this, I've also looked at updating the pdftohtml function to
change the exported name of the fonts away from the default "Times", and
I've got it to the point where it simply inputs the full name of the font -
such as "Myriad-Pro-Regular" - in the output HTML.  This 'kind of' works,
but only if you have the font installed on your machine.  Not a great
output, but I got that far by hacking around the source code to output a
substring'd version of the Fonts' full name. Is there a better way to do
this, some function I'm overlooking?  I'd really like for unknown embedded
fonts to not be converted to "Times"!

Thanks,
 - James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20120502/f4d3b618/attachment.html>


More information about the poppler mailing list