[poppler] [PATCH and RFC] Bugfixes, Improved Forms Support for Unicode
Carlos Garcia Campos
carlosgc at gnome.org
Sun Feb 3 07:38:42 PST 2008
El sáb, 02-02-2008 a las 21:27 -0800, Michael Vrable escribió:
> The root cause https://bugs.freedesktop.org/show_bug.cgi?id=12808 is
> that the code for rendering form fields in poppler didn't properly deal
> with input strings provided in UTF-16: the string was treated as an
> 8-bit string, and the byte-order-mark at the front was included in the
> length calculation.
>
> I started off trying to create a simple fix for this problem, but
> eventually ended up significantly rewriting the code for displaying form
> fields to fix other problems that I found, eventually working to add
> near full support for Unicode inputs.
>
> Since these changes are large, I don't expect this patch to go in right
> away. But please, provide feedback. My work in based on git commit
> 6f11ef660540.
>
> There are two patches. The first, character-encoding-fixes.patch, is a
> couple of fairly trivial fixes that I came across while working on the
> larger patch. It can go in at any time if it looks good.
>
> The second patch, unicode-forms-support.patch, is the main part of the
> work and the patch I'd like comments on. Most new functionality is in
> the new Annot::layoutText function. It performs a few steps:
> - Converts input in PDFDocEncoding or UTF-16 to the font's encoding
> - Computes the width of the text on the page
> - Optionally breaks the text at the specified width, for multi-line
> form fields
> All of this ended up in the same function since finding break-points for
> lines is easiest to do on the input encoding, where spaces and newlines
> are easier to recognize than in whatever encoding the font uses, but the
> width of text is easiest to compute when re-encoding the text string.
>
> The main missing element for full Unicode handling is the writing out of
> text for CID-keyed fonts. There is currently be support for taking
> Unicode characters as input and finding the appropriate character code
> in the font to show it. However, there isn't code for writing out the
> correct sequence of bytes to show that character (doing so should be
> trivial for an identity CMap, but isn't added quite yet).
>
> Also missing: support for Unicode text outside the BMP, using surrogate
> pairs.
>
> I've done some limited testing with these patches (in evince), and it
> definitely work better for me than before. However, I don't currently
> have PDFs for testing many features, so pointers to any good test forms
> are appreciated!
Hi Michael, thank you very much for the patches. I have tested them with
several documents and it works pretty well. The only thing that it's
still broken is multiline form fields. It was already broken indeed (see
bug http://bugzilla.gnome.org/show_bug.cgi?id=499939) but in a different
way. Now it seems to enter into an infinite loop after editing a
multiline form field.
You can use this file to reproduce the problem:
http://www.okular.org/stuff/forms-scribus.pdf
> Features tested:
> - Accented characters; typographic characters such as bullets, quotes
> - Left, center, right alignment of single-line fields
> - Checkboxes work as before
> - Single-line comb fields still work
> Not tested:
> - Multi-line fields (my test form doesn't have them)
> - Form fields with composite fonts (no test forms; code still needs a
> tiny bit of work)
>
> --Michael Vrable
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
--
Carlos Garcia Campos
elkalmail at yahoo.es
carlosgc at gnome.org
http://carlosgc.linups.org
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada
digitalmente
Url : http://lists.freedesktop.org/archives/poppler/attachments/20080203/c046e27a/attachment.pgp
More information about the poppler
mailing list