[poppler] [PATCH] Update to PDFDocEncoding Table
Michael Vrable
mvrable at cs.ucsd.edu
Wed Feb 13 21:59:25 PST 2008
Carlos (I believe) pointed me at a document with a form-editing bug, at
http://bugzilla.gnome.org/show_bug.cgi?id=365807. The text in the
upper-right corner is actually a multi-line form field. If you click on
that text, only the first is made available for editing. However,
editing the field to include additional lines still works.
The problem has to do with the conversion of strings from PDFDocEncoding
to Unicode. The lookup table for the conversion does not know what to
do with a carriage return, and so maps it to U+0000. When passed up to
evince for editing, the null character ends the string early, at the
first newline. The value of the field is initially stored in
PDFDocEncoding; when we edit it, we store the results back as a Unicode
string.
The fix: add carriage return and a few other characters to the
PDFDocEncoding table. Map them to the corresponding Unicode characters
(same numeric value). In this patch, I'm only adding mappings for
whitespace characters, not all control characters. I contemplated
adding mappings for all control characters, but it's not possible to do
a complete job since some bytes <0x20 are used for glyphs already.
While making this change, I also updated the table so that any unknown
characters are now mapped to U+FFFD (conventionally used to represent a
character that couldn't be converted) instead of U+0000. This should
prevent an unknown character in a PDFDocEncoding string from being
turned into a null in the future.
--Michael Vrable
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pdfdocencoding.patch
Type: text/x-diff
Size: 3216 bytes
Desc: not available
Url : http://lists.freedesktop.org/archives/poppler/attachments/20080213/ee3fdb91/attachment.patch
More information about the poppler
mailing list