[Poppler-bugs] [Bug 12808] form text input paced wrong

Tue Jan 29 20:48:08 PST 2008

http://bugs.freedesktop.org/show_bug.cgi?id=12808

Michael Vrable <mvrable at cs.ucsd.edu> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mvrable at cs.ucsd.edu

--- Comment #1 from Michael Vrable <mvrable at cs.ucsd.edu>  2008-01-29 20:48:07 PST ---
I have run into the same problem, and can verify that it is still present in
the most recent (as of 2008-01-29) development sources from git.  I spent some
time debugging, and have found what I think is the problem.

Form field contents are displayed in Annot::drawText.  The text to display
(passed as text) is a UTF-16 string, with byte-order mark (BOM).  The field
value is, I believe, set in FormWidgetText::setContent, which explicitly adds a
BOM to the string.

When generating the appearance stream, the text is converted (in
Annot::writeTextString) from Unicode to the appropriate 8-bit characters needed
for the selected font.  However, the string width is calculated before this, in
the main body of Annot::drawText, treating the original unconverted UTF-16
string as an 8-bit string.  The two bytes in the BOM (FE FF) are treated as
characters to display, so the computed width is too large.  This doesn't affect
left-justified form fields, but centered and right-justified fields are placed
incorrectly.

I currently have an ugly patch which works around this bug, and field alignment
appears correct after applying it.  But I don't yet handle anything other than
a simple single-line form field, so there are other cases which are probably
still buggy.

A larger issue, which I'm trying to figure out, is how the form field contents
are supposed to be interpreted.  Section 8.6.3 of the PDF 1.6 specification
says "The field's text is held in a text string (or, beginning with PDF 1.5, a
stream) in the V (value) entry of the field dictionary. The contents of this
text string or stream are used to construct an appearance stream for displaying
the field...".  The phrase "text string" seems to imply that the string is
either in PDFDocEncoding or UTF-16, which is what poppler seems to assume. 
However, from a little experimentation it seems Acrobat Reader (sorry, forget
which version) simply treats the field value as a string to be interpreted
according to whatever encoding is used by the font for the field, not
PDFDocEncoding.  I'm currently trying to make some sense of this, and figure
out what the correct fix is for the problem.

-- 
Configure bugmail: http://bugs.freedesktop.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.