[Clipart] release

Wed Mar 9 02:41:49 PST 2005

Jonadab wrote:

> The Unicode characters are another thing; I'm a little out of my
> depth on that one.  It seems very strange to me that a web browser
> would send part of the form data (the XML file) in one encoding,
> and other parts (e.g., the author) in another, different,
> incompatible encoding.  That feels like a browser bug to me, or
> perhaps a fundamental design flaw in the unicode standard,

This is not something that would be specified in the Unicode standard,
since Unicode encodings are not treated any differently than other
character encodings in this respect.  The multipart/form-data
specification is RFC 2388 (which doesn't even mention Unicode).

> (But, IMO, if Unicode were well-designed, I wouldn't *need* to
> understand its inner workings, because my code is NOT changing
> the way any of the characters are encoded;

Well, characters in the test files I uploaded last month were
changed from Unicode (UTF-8 and numeric character references)
into HTML character entity references.  Something that your
code is calling must be doing this.

There's also the matter of correctly encoding ampersands in
the metadata.  They need to be written as & (or as numeric
character references), otherwise they will be treated as markup.
Similarly, less-than signs need to be written as <, and,
to be safe, greater-than signs should be written as >.
(These three character entity references can be used in XML
without being declared.)

-- 
Stephen Silver