[poppler] Font info not getting properly into html when using pdftohtml

Sushant Sinha sushant354 at gmail.com
Tue Feb 8 07:28:45 PST 2011


I have attached a pdf document which is a mix of english and hindi
languages. For Hindi it uses Aryan2 font. When I use pdftohtml on this
doc, I do not get any font information in the html file. When I use the
"-xml" or the "-c" Aryan2 font is still outputted as Times. So there is
some problem with embedded fonts.

I have attached the pdf doc for your analysis.

$ pdffonts 2211.pdf 

name                                 type              emb sub uni
object ID
------------------------------------ ----------------- --- --- ---
---------
CFFEEL+TimesNewRoman                 TrueType          yes yes no   1852
0
CFFEGM+TimesNewRoman,Bold            TrueType          yes yes no
1854  0
CFFFEJ+TimesNewRoman,Italic          TrueType          yes yes no
93  0
CFFFHI+SymbolMT                      CID TrueType      yes yes yes
94  0
CFFGDG+Aryan2-Bold                   TrueType          yes yes no
95  0
CFFGEI+Aryan2-Normal                 TrueType          yes yes no
97  0
CFFGEH+Aryan2-Normal                 CID TrueType      yes yes yes
96  0
CFFGII+Tahoma,Bold                   TrueType          yes yes no
98  0
CFFGLJ+Tahoma                        TrueType          yes yes no
99  0


Can someone tell me why is this happening?

-Sushant.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 2211.pdf
Type: application/pdf
Size: 406452 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20110208/d24d75be/attachment-0001.pdf>


More information about the poppler mailing list