[Clipart] character coding

Nicu Buculei nicu at apsro.com
Sun Feb 6 22:35:19 PST 2005


Jonadab the Unsightly One wrote:
> Nicu Buculei <nicu at apsro.com> writes:
> 
> I do not off the top of my head know what character set ISO-8859-15
> is, other than that I think all the ISO-8859-anything charsets are
> fully ASCII-compatible in the bottom seven bits.  And it was my
> understanding that UTF8 has this property also.  So in *theory* it

ISO-8859-1 is Western and ISO-8859-15 is a newer version for Western 
(one that include the Euro symbol)

> should Just Work (in the sense of not making any undesired changes).

it works as long as the file does not contain special characters. in the 
current case the author is (i believe) french and his name contain an 
accented e, something which is not in the ASCII part.

>>is saved as ISO-8859-15 
> 
> 
> I was unaware that the filesystem maintained character-set metadata.
> What does it mean for a file to be "saved as" ISO-8859-15?  How can
> you tell what character set a file uses, apart from looking at the
> charset information in the XML declaration?

i opened the file in gedit and used 'Save As'. in that dialog is 
possible to select the desired encoding.
normally, on my Linux system, the default character-coding is UTF-8, but 
this particular file was ISO-8859-15 (as i learned from gedit)

> More to the point, how can the script detect what encoding the
> information it's receiving is encoded in, short of asking the user?

ideally, the encoding used to save the file should be consistent with 
what is declared inside the file, for example:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>

> Maybe we need a Unicode guru.  I'm not one.

me neither.

> Alternatively:  does RDF allow for non-ASCII characters in the
> metadata to be encoded as entities?  Could we just use something along
> the lines of HTML::Entities to encode it (so that e.g. the problematic
> character in the file in question would become é or somesuch)?
> Wouldn't that render the character encoding basically irrelevant?

-- 
nicu



More information about the clipart mailing list