[poppler] How poppler deal with multiple charsets?
suzuki toshiya
mpsuzuki at hiroshima-u.ac.jp
Tue Nov 1 02:37:37 PDT 2011
Hi,
Please check GfxCIDFont::getNextChar in GfxFont.cc, for non 8bit string,
you may find how poppler translates a bytestream to Unicode string.
I have to note that the text in PDF is related with a font in PDF,
so encoding info is determined by the font.
Also please check poppler-data package for the mapping table resource.
Regards,
mpsuzuki
杨辉强 wrote:
> Hi, all:
> I am a newbie to poppler. Now I want to extract text in pdf file
> which contain Chinese GBK or other charsets.
> Whether the poppler can deal with this situation and how it do it?
> Now I am hacking the source code.
> So I want to know which part of the source codes are related to dealing
> with multiple charsets.
>
>
>
> Thank you very much.
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
More information about the poppler
mailing list