[Clipart] character coding

Stephen Silver ocalocal at btinternet.com
Wed Feb 9 05:15:38 PST 2005


Jonadab wrote:

> > Well, there appear to be two problems.  Form-entered metadata does
> > need to be converted to UTF-8, but that won't fix the problem of
> > metadata in the uploaded file somehow being converted to Latin-1.
> 
> Do we know that the latter is happening?

Hmm... Apparently it isn't happening any more (but I'm sure it was when
I tried it before, about a month ago).

I uploaded two test files today, and the non-ASCII characters (whether
encoded as UTF-8 or as numeric character references) were converted to
character entity references, except for a couple that were converted to
numeric character references.  I think that this is wrong too: the SVG DTD
doesn't declare the character entity references so they can't be used.
Inkscape just strips them out.

> > I think you can probably cheat with the form-entered metadata by using
> >
> >   accept-charset="UTF-8 US-ASCII"
> >
> > in the <form> tag, then you should only receive UTF-8.  But very old
> > browsers may not know about accept-charset, and might send the data
> > in some other encoding.
> 
> I will try this and see if the problem goes away.  How "very old" does
> a browser have to be to ignore this?  Are we talking the Netscape 3
> kind of very old, or are we talking IE5?

Well, accept-charset was defined in HTML 4.0 (April 1998), but apart from
that I have no idea.  If you find that this doesn't work in many browsers
then you may have to look at the Content-Type field in the HTTP header to
see what encoding the data is in, and then convert it.

-- 
Stephen Silver




More information about the clipart mailing list