[poppler] FontInfo::getToUnicode()
Leonard Rosenthol
leonardr at pdfsages.com
Sat Jul 2 20:41:51 PDT 2005
At 12:58 PM +1000 7/2/05, Brad Hards wrote:
>Can anyone tell me what the meaning of the subject call is?
>
Sure, it returns the "ToUnicode" table associated with the
font (if one is present).
> >From reading the code, I guess it means that the text in that font
>is actually unicode encoded,
Actually, it means exactly the opposite!
It means that the code points used in the content stream are
NOT easily mapped to Unicode codepoints - and so a table needs to be
present in the PDF to allow the viewer/consumer to correctly convert
to Unicode codepoints, and thus enable correct text extraction.
You will see this almost 100% of the time with CID fonts, but
it can also be useful for other types (eg. Ghostscript Type3's).
Leonard
--
---------------------------------------------------------------------------
Leonard Rosenthol <mailto:leonardr at pdfsages.com>
Chief Technical Officer <http://www.pdfsages.com>
PDF Sages, Inc. 215-938-7080 (voice)
215-938-0880 (fax)
More information about the poppler
mailing list