[cairo] [PATCH 3/3] [test] Use UTF-8 in test files

Bill Spitzak spitzak at gmail.com
Wed Mar 11 11:28:36 PDT 2015



On 03/10/2015 08:02 PM, Lawrence D'Oliveiro wrote:
> On Tue, 10 Mar 2015 13:03:08 -0700, Bill Spitzak wrote:
>
>> The main culprit are idiots who think you have to "translate to
>> Unicode" immediately on input. That is a byte stream and should
>> remain a byte stream.
>
> What about line terminators? Many of these text-manipulation tools
> treat their input as divided up into lines. Does that not go against
> the concept of a “byte stream”?
>
> Especially when you get into the specifics of what constitutes a line
> terminator...

Programs should only use ASCII characters as line terminators. Using 
"NEL" and some Unicode characters will make your code incompatible with 
many other pieces of software. Several systems (YAML and JSON is a good 
example) have reverted attempts to read non-ASCII line terminators 
because it broke many other pieces of software, all of which had 
perfectly clear understanding of the encoding being used.

The characters can be found by matching patterns of bytes if you really 
want to find them. Other than NEL they cannot occur in non-Unicode so 
matching the UTF-8 would work. Matching one-byte NEL is very much 
recommended against as it was used as a printing character in CP-1252 
which is often confused with ISO-8859-1.


More information about the cairo mailing list