[cairo] Surface error not set when using cairo_show_text() with invalid utf8
spitzak at gmail.com
Mon Nov 1 16:31:55 PDT 2010
PLEASE do not make UTF-8 errors stop any output!
A lot of deluded systems engineers think doing this will "force people
to use Unicode correctly". But it does not, in fact it does the exact
When a programmer sees their output truncated because of a UTF-8 error,
they will then find the fastest possible method to get ASCII text after
that error to print correctly. They DO NOT CARE about the Unicode if
they cannot see the important information after it and they will not
devote even a millisecond of thought to it. Therefore the solutions are
often seriously detrimental to Unicode. Solutions I have seen:
1. Mask every byte with 0x7f
2. Copy to another buffer but strip every byte with the high bit set.
3. Copy to another buffer and replace every byte with the high bit set
with the hex version of the byte's value (this one at least is
attempting to preserve the data).
4. Double UTF-8 encode the text (in effect making it ISO-8859-1)
5. If there is a wchar interface, don't use the official converter, but
instead just alternate your bytes with null to "convert" it (in effect
making it ISO-8859-1).
Delusions that UTF-8 shoudl cause errors are probably the biggest
impediment to I18N. In many ways things are worse today than they were
in 1990, as more software is becomming ASCII-only because of solutions
such as above.
For a concrete suggestion: if you see a UTF-8 error, substitute a single
Unicode value such as U+FFFD for the *first* byte, and then continue
decoding starting at the next byte. The only functions that should
report that there were "errors" are functions explicitly named things
like "areThereErrorsInThisUTF8()". If the converter is for drawing only
(ie the output is not sent to another API) then converting the byte as
ISO-8859-1 or Windows CP1252 is probably better, as the output will be
readable if the text was accidentally in these encodings.
> As the subject says, looks like cairo_show_text() does not set the
> surface error (to be queried later by cairo_surface_status()) when
> provided invalid utf8 input (doesn't matter what it is - just has to be
> something that can't be properly decoded). Surface will not allow any
> more operations (i.e. any subsequent drawing is discarded) but at the
> same time status is still reported as "success". This is tested with
> cairo 1.9.8.
> Ideally I would prefer that surface would continue working after this
> error (since the error is in external data and seems to be caught early
> by utf8 validation function _cairo_utf8_to_ucs4). But at a very least,
> surface that can no longer be drawn into should be properly marked as such.
> cairo mailing list
> cairo at cairographics.org
More information about the cairo