[cairo] Surface error not set when using cairo_show_text() with invalid utf8

Tue Nov 2 14:03:11 PDT 2010

Maarten Bosmans wrote:
> 2010/11/2 Bill Spitzak <spitzak at gmail.com>:
>> PLEASE do not make UTF-8 errors stop any output!
>>
>> A lot of deluded systems engineers think doing this will "force people to
>> use Unicode correctly". But it does not, in fact it does the exact opposite!
> 
> The fact that people, upon misusing cairo api by feeding it non-UTF-8
> encoded data, do not resolve the problem properly, but resort to the
> kind of ugly hacks you mention below, can hardly be blamed on the
> "deluded system engineers" that made the supporting libraries.

This is EXACTLY the deluded impression.

If the programmer is forced to write code, when a simple and obvious 
change to the API would mean they could write NO code, then I think any 
"hacks" in that code *are* the engineer's fault, because a correct API 
would mean the hacks would not exist. Trying to pass blame for this is a 
disease that is pretty bad in Linux and open source, and I would like to 
see it stop!

> Why wouldn't one use any of the existing validation/conversion routines?
> http://library.gnome.org/devel/glib/2.26/glib-Unicode-Manipulation.html

Yes a programmer can call this if they want to detect errors. That has 
nothing to do with the problem. Maybe if we are really, really, really 
lucky, the programmer might call a useful function that preserves UTF-8 
(but there is no "convert this UTF-8 to the closest possible valid form" 
call in that library so I am afraid it will not happen).

> Silently interpreting data that should be UTF-8 as some other encoding
> when errors are encountered does not sound like a good approach.

I am not interpreting it as another encoding. I am trying desperately to 
prevent users from making hacks that do that. The suggestion I have is 
to replace *bytes* with an alternative symbol. The remaining string will 
continue to be interpreted as UTF-8. Most hacks done by users completely 
change the encoding, and often they do so even if there are no errors!

> Better would be to provide some kind of conversion function that takes
> a  collection of bytes and tries to interpret them as good as
> possible, always resulting in valid UTF-8.

I fail to see why forcing the programmer to allocate a buffer and do a 
conversion before calling the print function, just to fix a case where 
the print function throws an error rather than draw the obviously 
desired result, is "better".

Let me make another suggestion: lets add a "set the cairo error if there 
is an error in this UTF-8" function, and fix the drawing like I suggest. 
Then the users who want the current behavior can do these two calls.