[jon at joncruz.org: Re: [Inkscape-devel] [jonadab at bright.net: Re: [Clipart] character coding]]

Bryce Harrington bryce at bryceharrington.com
Sun Feb 6 22:41:33 PST 2005


----- Forwarded message from "Jon A. Cruz" <jon at joncruz.org> -----

Date: Sun, 06 Feb 2005 22:20:39 -0800
From: "Jon A. Cruz" <jon at joncruz.org>
To: Bryce Harrington <bryce at bryceharrington.com>
Subject: Re: [Inkscape-devel] [jonadab at bright.net: Re: [Clipart] character
 coding]

Bryce Harrington wrote:

>Jon, can you give some advice on this one?
> 
>
Well... ISO-8859-1 is the "Latin 9" that's becoming more common, 
especially in Europe.
http://www.cs.tut.fi/~jkorpela/latin9.html
http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-15.TXT
http://en.wikipedia.org/wiki/ISO_8859-15

Now, given the nature of UTF-8, it's fairly good at being detectable. 
That is, with the lead and trail byte combos, it's fairly easy to walk a 
file and determine if it's UTF-8, as a file of any significant size 
probably won't give false positives.

Of course, it's easy for content to lie about its encoding. Even though 
HTML or XML has the string "UTF-8" in i, anything could have changed it 
or edited it in the wrong encoding.

Further questions would be what script, what tool, what servers and what 
protocols are involved.
:-)




----- End forwarded message -----



More information about the clipart mailing list