[Clipart] upload script problem?

Stephen Silver ocalocal at btinternet.com
Thu Mar 31 05:37:17 PST 2005


Jonadab wrote:

> "Jonadab the Unsightly One" <jonadab at bright.net> writes:
> 
> > "Stephen Silver" <ocalocal at btinternet.com> writes:
> >
> >> It wouldn't make any difference, as you would still get the same
> >> conversion from codepage-1252 to UTF-8.
> >
> > I'm not clear on why that conversion happens.
> 
> I'm also not clear on why entities are converted.  I thought an
> entity identified a specific character, irrespective of charset,
> but apparently that is not the case?

It is the case.  So, for example, é always represents the
character e-acute.  But in codepage 1252, e-acute is encoded
as the byte 0xE9, while in UTF-8 it is encoded as the two-byte
sequence 0xC3 0xA9.  I haven't studied the code, so I don't know
exactly where the conversion to and from entity references occurs,
but it's clear that whatever was converting to entity references was
assuing codepage 1252, and whatever was converting back again was
using UTF-8, so you get the conversion:

   0xE9  ->  é  ->  0xC3 0xA9

The same thing happens if you use numeric character references
instead of character entity references, because a numeric character
reference also represents a fixed character.

> >> Storing the file on the server has to be the best way.
> 
> Checked in.  That should resolve the issue with binary files.

Excellent.

-- 
Stephen Silver




More information about the clipart mailing list