[jon at joncruz.org: Re: [Inkscape-devel] [jonadab at bright.net: Re: [Clipart] character coding]]
Bryce Harrington
bryce at bryceharrington.com
Sun Feb 6 22:41:33 PST 2005
----- Forwarded message from "Jon A. Cruz" <jon at joncruz.org> -----
Date: Sun, 06 Feb 2005 22:20:39 -0800
From: "Jon A. Cruz" <jon at joncruz.org>
To: Bryce Harrington <bryce at bryceharrington.com>
Subject: Re: [Inkscape-devel] [jonadab at bright.net: Re: [Clipart] character
coding]
Bryce Harrington wrote:
>Jon, can you give some advice on this one?
>
>
Well... ISO-8859-1 is the "Latin 9" that's becoming more common,
especially in Europe.
http://www.cs.tut.fi/~jkorpela/latin9.html
http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-15.TXT
http://en.wikipedia.org/wiki/ISO_8859-15
Now, given the nature of UTF-8, it's fairly good at being detectable.
That is, with the lead and trail byte combos, it's fairly easy to walk a
file and determine if it's UTF-8, as a file of any significant size
probably won't give false positives.
Of course, it's easy for content to lie about its encoding. Even though
HTML or XML has the string "UTF-8" in i, anything could have changed it
or edited it in the wrong encoding.
Further questions would be what script, what tool, what servers and what
protocols are involved.
:-)
----- End forwarded message -----
More information about the clipart
mailing list