[Poppler-bugs] [Bug 38456] Handling of small caps typographic variants

Sun Jan 12 11:48:52 PST 2014

https://bugs.freedesktop.org/show_bug.cgi?id=38456

--- Comment #1 from Jason Crain <jason at aquaticape.us> ---
Created attachment 91907
  --> https://bugs.freedesktop.org/attachment.cgi?id=91907&action=edit
Don't parse hex/decimal from character names

This document has type3 fonts with character names like /BD /BC /CD etc. 
Poppler is using these names as hex code Unicode values.

The document in bug #38456 is similar. It's using names like /c251, /c255,
/c262.  Poppler is using these numbers as the Unicode values.

Poppler and Xpdf are the only programs I've found that use the character name
this way.  Others just use the charcode.  This patch removes the decimal and
hex parsing and uses the charcode as fallback.

The side effects are mostly spacing differences from pdftotext due to adding
charcode values that were previously left out.  The only document I've found
that really breaks is the "Another pdf" attached to bug #16032, file name
"FAO_Nutri_goodnutrition in Crisis.pdf".  It's using names /g84, /g104 and
expects those names to be used as decimal Unicode values.  I don't know of a
way to get both sets of these files to work at the same time, but maybe that's
OK because the other programs I've tried can't extract text from this FAO
document either.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler-bugs/attachments/20140112/5f9f3304/attachment.html>