[poppler] FontInfo::getToUnicode()

Leonard Rosenthol leonardr at pdfsages.com
Sat Jul 2 20:41:51 PDT 2005


At 12:58 PM +1000 7/2/05, Brad Hards wrote:
>Can anyone tell me what the meaning of the subject call is?
>

	Sure, it returns the "ToUnicode" table associated with the 
font (if one is present).


>  >From reading the code, I guess it means that the text in that font 
>is actually  unicode encoded,

	Actually, it means exactly the opposite!

	It means that the code points used in the content stream are 
NOT easily mapped to Unicode codepoints - and so a table needs to be 
present in the PDF to allow the viewer/consumer to correctly convert 
to Unicode codepoints, and thus enable correct text extraction.

	You will see this almost 100% of the time with CID fonts, but 
it can also be useful for other types (eg. Ghostscript Type3's).


Leonard
-- 
---------------------------------------------------------------------------
Leonard Rosenthol                            <mailto:leonardr at pdfsages.com>
Chief Technical Officer                      <http://www.pdfsages.com>
PDF Sages, Inc.                              215-938-7080 (voice)
                                              215-938-0880 (fax)


More information about the poppler mailing list