[poppler] How poppler deal with multiple charsets?
mpsuzuki at hiroshima-u.ac.jp
Tue Nov 1 02:37:37 PDT 2011
Please check GfxCIDFont::getNextChar in GfxFont.cc, for non 8bit string,
you may find how poppler translates a bytestream to Unicode string.
I have to note that the text in PDF is related with a font in PDF,
so encoding info is determined by the font.
Also please check poppler-data package for the mapping table resource.
> Hi, all:
> I am a newbie to poppler. Now I want to extract text in pdf file
> which contain Chinese GBK or other charsets.
> Whether the poppler can deal with this situation and how it do it?
> Now I am hacking the source code.
> So I want to know which part of the source codes are related to dealing
> with multiple charsets.
> Thank you very much.
> poppler mailing list
> poppler at lists.freedesktop.org
More information about the poppler